Fuzzy-Based Spatiotemporal Hot Spot Intensity and Propagation—An Application in Crime Analysis

: Cluster-based hot spot detection is applied in many disciplines to analyze the locations, concentrations, and evolution over time for a phenomenon occurring in an area of study. The hot spots consist of areas within which the phenomenon is most present; by detecting and monitoring the presence of hot spots in different time steps, it is possible to study their evolution over time. One of the most prominent problems in hot spot analysis occurs when measuring the intensity of a phenomenon in terms of the presence and impact on an area of study and evaluating its evolution over time. In this research, we propose a hot spot analysis method based on a fuzzy cluster hot spot detection algorithm, which allows us to measure the incidence of hot spots in the area of study. We analyze its variation over time, and in order to evaluate its reliability we use a well-known fuzzy entropy measure that was recently applied to measure the reliability of hot spots by executing fuzzy clustering algorithms. We apply this method in crime analysis of the urban area of the City of London, using a dataset of criminal events that have occurred since 2011, published by the City of London Police. The obtained results show a decrease in the frequency of all types of criminal events over the entire area of study in recent years.


Introduction
Hot spot detection is a spatial analysis method aimed to detect regions on a map, called hot spots, within which a high concentration of events characterizing a specific phenomenon is localized. Each event is spatially referred and geometrically represented as a point on the map; cluster techniques are often applied on the dataset of events to detect cluster prototypes representing hot spots on the map.
Clustering methods are generally used to detect hot spots. The data points are made up of events assigned as elements with point geometry on the map. Clustering algorithms are used to locate and construct hot spots as elements with polygonal geometry on a map, corresponding to regions of the study area where the phenomenon is most insistent. Moreover, by analyzing the location and extension of hot spots detected in successive time frames, it is possible to study their evolution over time.
Some researchers apply density-based clustering to detect also irregular shapes of hot spots on the map. Kernel density-based algorithms [20] are applied in crime analysis [21], soil pollution [22], and traffic accident analysis [23]. The fast DBSCAN algorithm [24] is applied in [25] to detect hot spots with a high density of taxi passengers.
As a trade-off between the speed of execution of K-means and FCM and the accuracy in detecting the outline of hot spots obtained using density-based algorithms, in [26] a cluster-based hot spot detection method based on the extended FCM algorithm [27] (for short EFCM) is proposed; EFCM is an extension of the FCM algorithm in which cluster prototypes are hyperspheres in the feature space, rather than points, as in K-means and FCM. In [26], the hot spots are given by circles on the map; the authors show that these circles approximate the shape of the clusters detected by using density-based clustering. The EFCM hot spot detection algorithm is applied in [28] for disease analysis and in [29] for earthquake disaster analysis.
In particular, in [28] a method based on EFCM is proposed for spatiotemporal hot spot analysis. The dataset of events is partitioned in subsets, where each subset contains events occurred in a time frame. By analyzing the location and extent of the hot spots detected at each time step on the map, their displacement over time is traced; in addition, computing the spatial intersections between hot spots detected at consecutive time frames, it is possible to analyze in which geographical areas the phenomenon is persistent and in which geographical areas it has moved.
One of the major critical points in hot spot analysis is to define a measure of the intensity of the phenomenon on a specific area and to make an assessment of the reliability of this measure. The hot spot analysis algorithms proposed in the recent literature do not make use of a measure to evaluate the intensity of the phenomenon over an area and do not evaluate how reliable this measure is. In [30], an index evaluating the reliability of hot spots detected via EFCM clustering is proposed. This index measures the reliability of the hot spots measuring the fuzziness of clustering evaluated considering the De Luca and Termini fuzzy entropy of a fuzzy set [31,32]. In [33,34], the De Luca and Termini fuzzy entropy is used to measure the fuzziness of clustering detected by executing FCM; each fuzzy cluster constitutes a fuzzy set and its fuzzy entropy is measured by considering the membership degrees to it of the data points, so that the closer the cluster is to a crisp set, the less the measure of its fuzziness will be.
In this paper, we propose a new method to analyze the intensity and spatiotemporal evolution of hot spots detected using the EFCM spatiotemporal hot spot detection proposed in [28]. We assess the incidence over time of the phenomenon analyzed in a specific area by calculating an index called hot spot strength, which measures the percentage over time that the selected area is affected by hot spots. In addition, we measure the reliability of this evaluation, calculating a reliability index of the hot spot strength based on the hot spot reliability measure proposed in [30].
The main contributions of our research are summarized below: -In addition to analyzing the evolution of the phenomenon in a selected area for each time frame, our method evaluates with what intensity this area has been affected by the phenomenon in a given period of time; we measure this intensity by calculating the hot spot strength index. This measure is essential in an application context to understand with what intensity a certain phenomenon is spreading over an area of investigation; -A measure of the reliability of the hot spot strength is proposed, using the reliability index of the hot spot [30] to assess the reliability of the hot spot strength measured in a time frame; it is given by the weighted average of the reliability of the hot spots that insist on the selected area in this time frame, where the weight assigned to a hot spot is given by the extension of its spatial intersection with the selected area.
The EFCM hot spot detection algorithm and the fuzzy entropy hot spot reliability measure are summarized in Section 2. In Section 3, we present our method. Section 4 shows the results of its application in crime analysis on an area of study given by the City of London. Final considerations and future perspectives are included in Section 5.

EFCM Hot Spot Detection
Let X = {x 1 , . . . , x N } ⊂ R 2 a set of bi-dimensional data points extracted from a spatial event dataset. Each data point is a spatially referenced event given by its latitude and longitude coordinates.
EFCM returns cluster prototypes made of hyperspheres in the feature space. The C (0) clusters are assigned initially in EFCM; the optimal number of clusters C is found by dissolving, during each iteration, the two clusters most similar to each other if their similarity is greater than a prefixed threshold η.
Let V = {v 1 , . . . ,v C } ⊂ R n be the set of centers of the C clusters. Let U be the C × N partition matrix, where u ij is the membership degree of the jth data point x j to the ith cluster v i . Let r = {r 1 , . . . ,r C } be the set of radii of the C clusters.
EFCM minimizes the following objective function: where m is the fuzzifier parameter and δ ij , interpreted as the distance between the ith cluster and the jth data point, is given by: In (2), d ij is the Euclidean distance between the center of the ith cluster and the jth data point and r i is the radius of the ith cluster.
EFCM stops after t iterations if the difference less than a prefixed stop iteration threshold ε.
The parameters to set before executing EFCM are: -The fuzzifier parameter m; - The stop iteration threshold ε; - The threshold assigned to dissolve the most similar clusters η; - The initial number of clusters C (0) .
EFCM returns the centers of the final clusters, their radii, and the C × N partition matrix. EFCM is applied in [26] to detect hot spots, given by spatial regions where such events are localized with higher density. In hot spot detection, the features consist of the two geographical coordinates locating the events, and a hot spot is approximated by a circular area on the map. The prototype of the ith cluster detected by EFCM is given by a circle with center coordinates v i = (x i , y i ) and a radius r i . EFCM returns the centers of the C clusters V, their radii r, and the partition matrix U. The couple (v i , r i ) identifies a circle on the map.
In [26], the authors show that EFCM can approximate the shapes of hot spots on the map and is robust with respect to the presence of noise and outliers.

Fuzzy-Entropy-Based Hot Spots Reliability Evaluation
In [33,34], a measure of the reliability of hot spots detected via EFCM based on the De Luca and Termini fuzzy entropy [31,32] h is monotonically increasing in in [0, 1 /2); 4.
The fuzzy entropy function h has a minimum (0) when u is 0 or 1 and a maximum when u = 1 /2.
De Luca and Termini in [26,27] propose the following fuzzy entropy function: which has the maximum value 1 when u = 1 2 ; this is called Shannon's function. If X = {x i , x 2 , . . . , x N } is a discrete set, the entropy measure of fuzziness of the fuzzy set A is given by: where K is a multiplicative constant. If H(A) = 0, then for each element x j , j = 1, . . . ,N A(x j ) = 0 or A(x j ) = 1 and A coincides with a subset of the set X; if for each element x j A(x j ) = 1 /2, then the fuzziness of A is maximal. If A is a crisp set, its fuzziness is null and H(A) = 0. The higher the fuzziness of a fuzzy set, the closer the mean membership degree to the fuzzy set of X's elements approaches 1 /2.
In [33,34], the fuzziness measure (4) is used to construct a new validity index applied to evaluate the optimal number of clusters in FCM. If A i is the ith fuzzy cluster where i = 1, . . . ,C is considered as a fuzzy set and u ij is the membership degree of the jth data point to the ith cluster, the authors use the following fuzzy entropy measure of A i : where N is the number of data points and the De Luca and Termini fuzzy entropy function (4) is used.
In [30], the reliability of the ith detected hot spot is measured by calculating the fuzziness of the detected clusters. The reliability index of a detected hot spot is given by the formula: The reliability of each hot spot is evaluated by calculating its reliability index by (6); this is a value in the range [0,1]. Finally, the reliability thematic map is produced.
In [30], the authors propose an EFCM-based hot spot detection algorithm in which the reliability of each hot spot is calculated by (6).
Below we show this algorithm, abbreviated as the HR-EFCM (Algorithm 1).
Return the partition matrix U, the cluster centers v i , their radius r i , and their reliability R i i = 1, . . . ,C In [30], the HR-EFCM algorithm is applied to detect hot spots in disease analysis; the results show that the reliability of a hot spot is linearly dependent on the standard deviation of the values of the membership degrees of the data points to the corresponding fuzzy cluster. Furthermore, comparative tests show that the reliability values calculated using the hot spot reliability evaluation algorithm are correlated to the reliability values assigned by the pool of experts.

The Proposed Framework
We propose a novel method based on the HR-EFCM algorithm that evaluates the reliability of the results of the spatiotemporal evolution of hot spots.
Let X be a dataset of georeferenced events partitioned in T subsets X 1 , X 2 , . . . , X T , where X t t = 1, 2, . . . , T is a subset containing all events that occurred in a given time frame t.
For each subset, HR-EFCM is executed to detect the hot spots as circles on the map. For each hot spot, the reliability is calculated as in (6).
In order to analyze the localization and evolution of hot spots in a selected area on the map, the zones are defined in this area covered by hot spots and detected in each time frame; each of these zones consists of the extent of the spatial intersection between a hot spot and the selected area.
In Figure 1, an example is shown of this process, in which the dataset X is partitioned into three subsets, corresponding to three consecutive time frames.  For each time frame an index is calculated, called the hot spot strength, which measures the percentage of the extent of the selected area covered by hot spots; moreover, an assess-ment of the reliability of this measure is calculated, given by the weighted average of the reliability of each hot spot covering the selected area, in which the weight is constituted by the extent of the spatial intersection between the hot spot and the selected area.
The hot spot strength measured in the tth time frame is given by: where C t is the number of hot spots detected in the tth time frame, D i,t is the extent of the spatial intersection between the ith hot spot detected in the tth time frame and the selected area, and D is the extent of the selected area.
If the ith hot spot does not intersect with the selected area, D i,t is null and this hot spot does not contribute to the calculation of the hot spot strength index S t .
S t takes on a value between 0 and 1; it is equal to 0 if no hot spot detected at the tth time frame covers the selected area; conversely, it is equal to 1 if the extent of the selected area is covered by hot spots detected at the tth time frame.
The reliability of the hot spot strength S t is evaluated by the formula: where R i,t is the reliability of the ith hot spot detected in the tth time frame, varying in the range of [0,1]. RS t varies between 0 and 1; it is equal to 0 if the reliability of all hot spots covering the selected area detected in the tth time frame is zero; conversely, it is equal to 1 if the reliability of all these hot spots is equal to 1.
In synthesis, the hot spot strength of the phenomenon in the selected area at the tth time frame is measured as the ratio between the sum of the extents of the spatial intersection between the hot spots detected in this time frame and the selected area and the extent of the selected area; its reliability is given by the weighted average of the reliability of each hot spot measured by (6), where the weight is the extent of the spatial intersection between this hot spot and the selected area.
In the preprocessing phase, HR-EFCM is executed for each subset in order to detect the C t hot spots, t = 1, . . . , T, and calculate their reliability (Algorithm 2). Algorithm 2 Spatiotemporal hot spots detection 1. Extract the event dataset X 2. Partition the dataset into T subsets X 1, X 2 , . . . ,X T 3. For t = 1 to T //for each subset of events occurred in tth the time frame 4.
Execute HR-EFCM (X t , m, ε, η, C (0) ) 5. Next t 6. Return the C t hot spots detected and their reliability t = 1, . . . , T After selecting an area on the map, for each time frame the hot spot strength and its reliability are calculated, respectively, via Equations (7) and (8) (Algorithm 3).

Algorithm 3 Spatiotemporal hot spots Strength Evaluation
1. Select a zone on the map 2. For t = 1 to T //for each subset of events occurred in tth the time frame 3. For I = 1 to C t //for all the hot spots detected executing HR-EFCM on the tth subset 4.
S t ← 0 5. RSN ← 0 6. RSD ← 0 7. D i,t ← area of the part of the ith hot spot intersecting the selected region 8.
S t ← S t + D i,t 9.
RSN ← RSN + D i,t * S i,t 10. RSD ← RSD + D i,t 11. Next i 12. RS t ← RSN/RSD 13. Next t 14. Return the hot spot strength S t and its reliability t = 1, . . . , T By analyzing the trend of the hot spot strengths calculated in each time frame, it is possible to evaluate how the diffusion of the phenomenon analyzed in the selected area has varied over time. Furthermore, the assessment of the reliability of the hot spot strength values allows the overall reliability of the results of the analysis to be evaluated.
To test the proposed method in an application context, we took into consideration various types of criminal events (robberies, shoplifting, car thefts, acts of sexual violence, etc.) in urban agglomerations in the City of London. The tests were carried out considering the Lower Super Output Areas (for short LSOAs) in the City of London as the study area and analyzing the spread of different types of criminal events that occurred from September 2011 to July 2021. In Section 4, the results of all experiments are shown and discussed.

Experimental Results
The LSOA in the City of London is shown in the map in Figure 2. We applied our method using a dataset of crime events that occurred in the LSOAs in the City of London from September 2011. Following the neighborhood policing model known as sector policing (https://www. cityoflondon.police.uk), examined on 1 July 2021, the city of London is split into two sectors, east and west, with a senior leader responsible for each sector. Each sector is broken down further into three regions called clusters, including adjoint wards. Every cluster is guarded by a group of police officers, the Dedicated Ward Officers (DWO), responsible for maintaining order in that region.
The west sector includes the Fleet Street, Bank, and Barbican clusters, while the east sector includes the Monument, Liverpool Street, and Fenchurch Street clusters. A map of the six clusters is shown in Figure 3. In these experimental tests, we applied the proposed method for analysis by type of criminal event, the incidence of the phenomenon on each of the six clusters in which the City of London is partitioned, and its evolution over time. This analysis will allow us to assess how effective the DWO's cluster surveillance may have been, monitoring how this effectiveness changes over time.
The database used in these experiments is composed of 22,310 georeferenced crime events that occurred in the City of London from September 2011 to July 2021. It is partitioned into 14 datasets corresponding to the 14 crime types recorded by the police in England and extracted from the website https://data.police.uk/ (accessed on 1 July 2021).
We implemented our method using GIS ArcGIS Desktop 10.8. The geographical coordinate system used in our experiments was the projected Universal Transverse Mercator British National Grid coordinate system.
In the preprocessing phase, all the recorded events without geolocation information were discarded, while the other events were georeferenced and divided by type of crime and year of occurrence.
For each dataset, we executed our method by partitioning it in eleven subsets, corresponding to the crime events occurred in each time frame, where a time frame is given by a year. Table 1 shows the number of recorded and georeferenced crime events belonging to each crime type that occurred in this period in the City of London.
Criminal events belonging to the typology "theft from the person" were recorded only starting from 2019; the "bicycle theft" and "possession of weapons" events started from 2013.
For each subset corresponding to a crime type, the spatiotemporal hot spot detection algorithm is executed and results are obtained for each year the hot spots are detected, as given by circular areas on the map.
Then, the spatiotemporal hot spot strength evaluation algorithm is executed for each crime type on the six clusters, measuring for each year the hot spot strength of the crime type on the cluster and its reliability.
For each cluster in the City of London, it is possible to analyze the locations and variation over time of the areas covered by hot spots detected by events belonging to a specific crime type. For the sake of brevity, the results obtained for two types of crimes are detailed below for "drugs" and "shoplifting" in the Bank and Liverpool Street clusters.
The map in Figure 4 respectively shows the areas covered by hot spots detected for drugs crime events that occurred in the years 2017 (marked in blue) and 2018 (marked in red) in the Bank cluster.  One can observe that a large zone covered by a drug crime hot spot is present in the eastern area of the Bank cluster in 2017, which is no longer present in subsequent years. On the other hand, an area covered by hot spots persists in the northwest of the cluster.     The hot spot strength reaches a maximum in 2016 of 35%, then decreases to a minimum of less than 10% in 2018 and stabilizes at a value of around 15% from 2019. Figure 9 shows the trend of the reliability of the hot spot strength calculated by (8). The reliability fluctuates between a minimum value of 0.75, reached in 2016, and a maximum value of 0.9. Since 2019, its trend has been constant, being approximately almost equal to 0.9.   Of particular importance in Figure 10 are the decrease over time of the hot spot strength in Bank (which halves from 80% in 2011 to 40% in 2021) and the increase in Fenchurch Street (which is zero in 2017 and reaches 50% in 2020).
The map in Figure 11 shows the areas covered by hot spots detected for drug crime events in the years 2017 (marked in blue) and 2018 (marked in red) in the Bank cluster.   In the cluster there are two hot spots, one central and the other in the border area with the Monument cluster; the latter appears starting from 2018. Of particular significance is a reduction in the central hot spot from 2017 to 2021. Figure 15 plots the trend of the hot spot strengths detected in the London Street cluster for shoplifting from 2011 to 2021.   This trend is similar to the one shown in Figure 8 for drug crimes in the Bank cluster. The hot spot strength reaches a maximum in 2013 of 70% and then decreases to reach a minimum of 10% in 2021. Figure 16 shows the trend of the reliability of the hot spot strength calculated by (8). The reliability fluctuates between a minimum value of 81%, reached in 2013, and a maximum value of 88%. Since 2016, it has remained approximately equal to this value.
To analyze the presence of correlations between the hot spot strength and the extension of areas with high data point density, the ratio between the extension of areas with annual data point density per square kilometer greater than the threshold and the extension of the Liverpool Street DWO cluster is calculated in each year. Here, we use three threshold values, 200, 300, and 400 data points per square kilometer. Table 2 shows for each year both the hot spot strength and the values of these ratios obtained, with the thresholds set at 200 (D200), 300 (D300) and 400 (D400) data points per square kilometer.    The three trends are similar to the hot spot strength trend. In particular, the trend for the D300 index is most similar to the trend for the hot spot strength, with a mean absolute difference of 5% (12% for D200 and 7% for D400) and a Pearson's linear correlation coefficient value of 0.935 (0.922 for D200 and 0.923 for D400). Similar trends are obtained for other crime types and in all DWO clusters.
Although the use of a simple density-based statistical analysis provides results approximately similar to those obtained by measuring the hot spot strength, this analysis, in addition to the disadvantage of depending on the choice of spatial density threshold, can only provide approximate results. In order to obtain more precise and reliable results, it is necessary to use density-based cluster algorithms; however, such algorithms have high computational complexity. The use of the hot spot strength method, on the one hand, involves linear computational complexity, as it executes an FCM-based cluster algorithm to detect the hot spots. Furthermore, it is able to provide reliable results and to evaluate this reliability by measuring the reliability of the hot spot strength. Figure 20 plots the trends for the hot spot strengths obtained for all the six clusters for shoplifting in the City of London.
The trends in Figure 20 show a strong decrease in hot spot strength in Liverpool Street from 2013, which reaches a maximum value of 70%; in all other clusters, the strength never reaches a maximum value greater than 30%. For all clusters, the hot spot strengths for shoplifting are below 20% in 2021.
Trends for the hot spot strengths of other types of crimes also show decreases in recent years in all clusters of the City of London. These results show that in recent years, the control of the entire City of London by the police has been further intensified and improved. Furthermore, since the reliability of hot spot strength measures is always greater than 70%, and starting from 2016 is always greater than 80%, all of the results obtained can be considered reliable.

Conclusions
This paper presents a novel method aimed to analyze the spatiotemporal evolution of hot spots. The incidence of hot spots in a selected area is measured by calculating an index called the hot spot strength, and by evaluating its reliability using a method based on the De Luca and Termini fuzzy entropy measure.
This method is applied in crime analysis of the study area of the City of London, by acquiring the datasets published by the UK police relating to the various types of criminal events that occurred every year in the City of London from 2011 to 2021. The hot spot strength values measured on each cluster into which the city is partitioned and their reliability are calculated for each year and for each type of crime. The results show a general decrease in hot spot strength starting from 2016 in all clusters, which suggests an improvement in the control of the City of London by the wards in recent years. The hot spot strength measures can be considered reliable, as their reliability values are always greater than 0.7.
In the future, we intend to apply our method in different contexts and for the different problems, and to test it in GIS-based platforms to monitor the location, intensity, and temporal evolution of natural, anthropogenic, or climatic events in areas affected by a certain phenomenon.

Conflicts of Interest:
The authors declare no conflict of interest.