Clustering of Road Trafﬁc Accidents as a Gestalt Problem

: This paper introduces and illustrates an approach to automatically detecting and selecting “critical” road segments, intended for application in circumstances of limited human or technical resources for trafﬁc monitoring and management. The reported study makes novel contributions at three levels. At the speciﬁcation level, it conceptualizes “critical segments” as road segments of spatially prolonged and high trafﬁc accident risk. At the methodological level, it proposes a two-stage approach to trafﬁc accident clustering and selection. The ﬁrst stage is devoted to spatial clustering of trafﬁc accidents. The second stage is devoted to selection of clusters that are dominant in terms of number of accidents. At the implementation level, the paper reports on a prototype system and illustrates its functionality using publicly available real-life data. The presented approach is psychologically inspired to the extent that it introduces a clustering criterion based on the Gestalt principle of proximity. Thus, the proposed algorithm is not density-based, as are most other state-of-the-art clustering algorithms applied in the context of trafﬁc accident analysis, but still keeps their main advantages: it allows for clusters of arbitrary shapes, does not require an a priori given number of clusters, and excludes “noisy” observations.


Introduction
Road traffic accidents represent a global health and social problem. It is estimated that approximately 1.35 million people die each year in traffic accidents, up to 50 million are injured, and the costs for countries are approximately equal to three percent of their annual gross domestic product [1]. In the EU, 22,700 people die each year in traffic accidents and 120,000 are seriously injured, while the external cost of road traffic accidents represents approximately two percent of the EU's annual gross domestic product [2].
It comes as no surprise that significant research efforts have already been devoted to the question of automatic detection of traffic-accident-prone areas. In this paper, we consider a somewhat more specific question. One way to increase traffic safety is by traffic monitoring and managing. However, in circumstances of limited human or technical resources, it is necessary to select "critical" road segments to be the subject of monitoring or managing. For example, Figure 1 provides a map of traffic accidents with injuries or death that occurred in "inner" Belgrade, Serbia, over the one-year period from January 2021 to December 2021. It shows a relatively dense distribution with no clear cluster separation. The research question considered in this paper can be stated as follows: given data on traffic accidents, how we should conceptualize, cluster, and select "critical" road segments? Thus, the reported study makes novel contributions at three levels: • At the specification level, we conceptualize "critical segments" as road segments of spatially prolonged and high traffic accident risk (cf. Section 2); • At the methodological level, we propose a two-stage approach to traffic accident clustering and selection (cf. Section 3); • At the implementation level, we report on a prototype system and illustrate its functionality using publicly available real-life data (cf. Section 4). The point of departure for this study is that spatial clustering of traffic accidents is a Gestalt problem. One of the traditional problems considered by Gestalt psychologists is related to the question of how humans naturally group points on a two-dimensional plane. The approach presented in this paper is psychologically inspired to the extent that it introduces a clustering criterion based on the Gestalt principle of proximity. In line with this, the proposed algorithm is not density-based, as are most other state-of-the-art clustering algorithms applied in the context of traffic-accident analysis. On the other hand, it keeps their main advantages: it allows for clusters of arbitrary shapes, does not require an a priori given number of clusters, and excludes "noisy" observations. The rest of this paper is organized as follows. Section 2 provides an overview of related work and describes the main idea underlying this study. Section 3 formally introduces a novel approach to spatial clustering and selection of road traffic accidents. Section 4 illustrates the functionality of a prototype system. Section 5 discusses the approach from the perspective of other relevant studies inspired by the Gestalt principles. Section 6 concludes the paper.

Related Work and Main Idea
The research question of traffic accident clustering has been devoted significant research attention [3][4][5][6]. For a more comprehensive overview, the reader may consult [3,7,8].
Here, we reflect on selected methodological aspects and emphasize the main idea of this particular study.
Some of the widely applied clustering algorithms (e.g., k-means type algorithms [9,10]) take the number of clusters as an input parameter (cf. also [11]). In practice, the observed data are clustered repetitively by varying the input number of clusters; then, the optimal number of clusters is selected with respect to some criterion. One of such criteria is based on the pooled within-cluster sum of squares around the cluster means [12]: where t is the number of clusters, C i is the ith cluster, n i is the number of observations assigned to cluster C i , and d jk is the pairwise distance between observations j and k. A plot of the within-cluster dispersion versus the applied number of clusters typically contains an elbow that indicates the optimal number of clusters [12]. An alternative method to determine the optimal number of clusters is described in [13].
In general, the requirement that the number of clusters should be given a priori represents a limitation. In addition, the k-means algorithm considers the entire dataset and generates spherical shape clusters that are not necessarily suitable to represent trafficaccident-prone areas [8]. To address these limitations, the density-based DBSCAN algorithm [14] is aimed at eliminating noise from data and allowing for clusters of arbitrary shapes. Instead of an a priori given number of clusters, this algorithm accepts two different parameters: the maximum neighborhood radius and the minimum number of points required to form a dense region. The OPTICS algorithm [15] is an extension of the DBSCAN algorithm that produces a density-based clustering structure of a dataset.
It is shown in [8] that the density-based clustering algorithms perform better than the k-means algorithm in the context of traffic-accident analysis. Similarly to them, the algorithm introduced in this paper allows for clusters of arbitrary shapes and does not require that the number of clusters is given in advance. The proposed clustering approach is not density-based, but inspired by the Gestalt principle of proximity [16]. According to this principle, when humans are confronted with a number of the same visual stimuli (e.g., points on a two-dimensional plane), the most natural form of grouping involves the smallest interval. For example, for the set of points given in Figure 2i, the most natural arrangement would be abc/def/ghi, while for the set in Figure 2ii the natural grouping would be adg/beh/cfi. It is important to note that the natural grouping is by no means impeded by increasing the number of points [16]. The most natural arrangement in (i) would be abc/def/ghi; the most natural arrangement in (ii) would be adg/beh/cfi (inspired by [16]).
We build on the Gestalt principle of proximity and introduce a novel approach to automatic spatial clustering of road traffic accidents. At the level of specification, our study aims at detecting road segments of spatially prolonged and high traffic accident risk. A road segment is considered to be of spatially prolonged risk if it is related to a nonempty set N of traffic accident locations, which can be considered close to each other by means of transitive closure. More precisely, let R be a relation defined on N as follows: where τ is a spatial threshold and d(n i , n j ) is spatial distance between traffic accidents n i and n j . A cluster is formed as a transitive closure of R, and detection of road segments of spatially prolonged risk is achieved by means of clustering, as explained in Section 3.2. Spatial threshold τ is an input parameter to the introduced clustering algorithm, and the selection of its particular value is discussed in Section 4.2.
In addition, a road segment is considered to be of high traffic accident risk if it can be considered dominant in terms of number of accidents. The adaptive selection of high-risk road segments are introduced in Section 3.3. Thus, our approach can be represented as a two-stage algorithm. The first stage is devoted to spatial clustering of road traffic accidents. The second stage is devoted to selection of dominant clusters.

Methods
In this section, we formally introduce our two-stage approach to road traffic accident clustering and selection. Section 3.1 introduces the basic notions. Section 3.2 describes a graph-based approach to spatial clustering of traffic accidents, and Section 3.3 introduces an approach to adaptive selection of clusters that are dominant with respect to the number of traffic accidents.

Basic Notions
A road traffic accident n i is represented as follows: where • id i is a unique identification number of n i ; • ϕ i and λ i are positional coordinates of n i , i.e., latitude and longitude expressed in radians, respectively.
Spatial distance between traffic accidents n i and n j is calculated based on the haversine formula [17]: where a(n i , n j ) = sin 2 ϕ 2 − ϕ 1 2 + cos ϕ 1 cos ϕ 2 sin 2 λ 2 − λ 1 2 , function atan2 is an adoption of the arctangent function designed to calculate an unambiguous angle value, and R = 6371 · 10 3 m (i.e., mean Earth radius). In addition, let τ be a spatial threshold value representing an input parameter to the clustering algorithm, and let N = {n 1 , n 2 , . . . , n k } be a set of traffic accidents that occurred in a given period.

The Clustering Algorithm
The proposed clustering approach adapts the graph-based image segmentation algorithm introduced in [18] (cf. also [19]) and can be described as follows: 1. Throughout the algorithm execution, current clustering results are represented by integer array: . . , k}) and c(n i ) represents the identification number of a cluster to which traffic accident n i is currently assigned. In Step 1, each traffic accident is assigned to its own cluster, i.e., 2. Let D(N, τ) be a set of all combinations of two traffic accidents (i.e., a set of all unordered pairs of traffic accidents) whose mutual distance is less than or equal to the threshold value τ. In other words, set D(N, τ) contains pairs of traffic accidents that are considered close to each other and are thus candidates to be in the same cluster. Without loss of generality, set D(N, τ) can be defined as 3. We generate a sequence that contains all elements from D(N, τ) ordered by nondecreasing distance between traffic accidents.
4. We iterate through sequenceD(N, τ) from the first to the last position. For each ordered pair δ p = (n i , n j ) inD(N, τ), if traffic accidents n i and n j belong to different clusters c(n i ) and c(n j ), then those clusters are merged, i.e., Thus, the clustering is performed by means of transitive closure of the undirected graph over set N defined in Step 3 (cf. sequenceD (N, τ)).
The clustering results are represented by array C after Step 4 is completed. In general, array C generated in this algorithm stage contains information on t clusters, where 1 ≤ t ≤ k (i.e., the number of cluster is equal to the number of distinct values in C).

Cluster Selection
In the second algorithm stage, a subset of clusters that are dominant with respect to the number of traffic accidents is adaptively selected. Let χ(C) be the histogram of array C, i.e., where • c i is the identification number of a cluster contained in array C, • p i is the number of traffic accidents assigned to cluster c i , The adaptive cluster selection algorithm represents an adaptation of the method of threshold selection for image binarization introduced in [20] (pp. 120-121; cf. also [21]) and can be described as follows.

1.
The starting threshold value µ 0 is set to the average number of traffic accidents per cluster: 2. Given a current threshold value µ i , where i ≥ 0, set χ(C) is divided into two disjoint subsets based on µ i : and the subsequent threshold value µ i+1 is calculated as 3.
If the change in threshold is not significant, i.e., the calculation is completed and the final threshold µ is set to µ i+1 . Otherwise, the process returns to Step 2.
Finally, a subset of clusters that are dominant with respect to the number of traffic accidents is adaptively derived by applying the calculated threshold value µ:

Results
This section reports on the prototype system and describes the results obtained when it was applied to real-life data.

Tools
A prototype system based on the approach introduced in Section 3 is implemented in the Racket programming language. To graphically represent spatial data and estimate areas covered by clusters, we applied the ArcMap component of the Esri's ArcGIS suite.

Spatial Threshold Selection
Spatial threshold τ introduced in Section 3.2 represents an input parameter to the clustering algorithm. We set threshold τ to 200 m for the following reason. The national urban speed limit is set to 50 km h [22] (cf. article 43). However, to account for the relationship between the posted speed limit and actual speeds in urban areas, we consider the minimum speeding offense of exceeding the speed limit by up to 20 km h [22] (cf. article 333). Therefore, we assume a driver operating her or his vehicle at a speed of 70 km h and define the spatial threshold as the distance traveled by this vehicle in ten seconds (i.e., τ ≈ 200 m).
Although spatial threshold τ is assigned a particular value, we recall that it is introduced as an input parameter. In general, its value is intended to be set according to external criteria, which may vary with the application context. Thus, the spatial threshold is not learned as a hyperparameter in the sense typically found in the field of machine learning. Instead, it is intentionally left to the practitioner to decide on the spatial threshold value, i.e., on the maximum distance between two traffic accident locations that are considered close to each other.

Data
We resort to a publicly available dataset on traffic accidents provided by the Ministry of Interior of the Republic of Serbia. To illustrate the functionality of the prototype system (cf. Section 4.4), we use a part of this dataset containing details on 15,366 road traffic accidents that occurred in Belgrade, the capital of Serbia, over the one-year period from January 2021 to December 2021 [23]. Those accidents can be divided in three groups: We consider only severe road traffic accidents from the last two groups, i.e., 4072 (3996 + 76) accidents with injuries or death. For each accident, the prototype system considers only its unique identification number and positional coordinates (i.e., latitude and longitude). The map showing a subset of road traffic accidents with injuries or death that occurred in "inner" Belgrade during 2021 is given in Figure 1.
To estimate the stability of results through time (cf. Section 4.5), the algorithm is applied to data on traffic accidents with injuries or death that occurred in one of the "inner" Belgrade municipalities-i.e., the municipality of Zvezdara-over the three-year period from January 2019 to December 2021 [23][24][25].

Algorithm Execution
In the first algorithm stage, 4072 traffic accidents are divided into 1439 clusters. The average number of accidents per cluster is 2.796, with a standard deviation of 8.909. In the second algorithm stage, only ten clusters are selected as dominant with respect to the number of traffic accidents. The average number of accidents per cluster is 73.3, with standard deviation of 69.103 (cf. Table 1). The map representation of the selected clusters is given in Figure 3. Although the map shows only "inner" Belgrade, it contains all ten clusters selected when the prototype system was applied to data on traffic accidents in the entire city. The numbers of traffic accidents assigned to each cluster are provided in the second row of Table 2. The cluster identification numbers given in this table correspond to those given in the legends of Figures 3 and 4.   There is a set of well-established measures that are often applied to analyze results of traffic accident clustering by means of evaluating the tightness and separation of clusters: the silhouette coefficient [13], Calinski-Harabasz index [26], Davies-Bouldin index [27], etc. However, these measures are rather general (i.e., task-independent). Consequently, validation approaches based on these measures lack task-related criteria. In contrast to them, we apply a qualitative evaluation based on traffic-related criteria.
In line with this, the obtained results can be considered promising: ten selected clusters covering approximately 0.11 percent of the city area (i.e., 3.65 km 2 out of approximately 3233 km 2 , cf. Table 2) capture 18 percent of all traffic accidents (i.e., 733 out of 4072, cf. Table 1).
For the purpose of further illustration, we compare the clustering results with the locations of traffic camera poles derived from the publicly available information provided by the Ministry of Interior of the Republic of Serbia [28]. To justify this decision, it is important to clarify the following: • The locations of camera poles are determined by a third party, independent of this study. • The introduced algorithm is agnostic of the camera pole locations, i.e., they are not considered in the clustering process.
• The traffic accident data used to generate clusters are collected during 2021. At the moment of conducting this study (i.e., March 2022), the considered traffic cameras still have not been put into use, i.e., they did not influence the traffic behavior in the observed period.
Thus, the particular camera pole locations can serve as an indirect "response" variable. Out of 464 camera poles installed in Belgrade, seventy are located within the selected clusters. The numbers of camera poles within each cluster are provided in Table 2. The map representation of the selected clusters and camera poles within them is given in Figure 4. It can be observed that the ten selected clusters, which cover 0.11 percent of the city area, capture 15 percent of the camera poles.

Stability of Results through Time
To estimate the stability of results through time, the introduced algorithm is applied to data collected in the same spatial area at different periods. The previous section considers the entire city of Belgrade, which has a surface area of approximately 3233 km 2 . In this section, the same spatial threshold (i.e., τ = 200 m) is applied to just one of the "inner" Belgrade municipalities-the municipality of Zvezdara-which has a surface area of approximately 31.11 km 2 (i.e., 9.6 percent of the city surface area). In line with our goal to introduce an approach suitable for application in circumstances of limited human or technical resources for traffic monitoring and management, this municipality was selected as one of the "inner" municipalities with fewest camera poles. It contains only 16 out of 464 camera poles installed in Belgrade.
The algorithm is applied to publicly available data on traffic accidents with injuries or death that occurred in the municipality of Zvezdara over the three-year period from January 2019 to December 2021. The maps showing road traffic accidents that occurred in this municipality during 2021, 2020, and 2019 are given in Figure 5a,c,e, respectively. The corresponding map representations of the selected clusters are given in Figure 5b,d,f, respectively. The camera pole locations (March 2022) are represented for the purpose of completeness. A summary of the clustering and selection results is given in Table 3. The selected clusters are described in Table 4. Table 3. A summary of the clustering and selection results obtained when the introduced algorithm was applied to publicly available data on traffic accidents with injuries or death that occurred in the municipality of Zvezdara during 2021, 2020, and 2019, respectively.  The stability of results through time is considered in two aspects: the share of traffic accidents belonging to the selected clusters, and the overlapping surface area between the selected clusters in all three years. With regard to the first aspect, the following can be observed: Thus, the share of traffic accidents belonging to the selected clusters is steady through the given three-year period (i.e, 36.59, 35.82, and 38.97 percent, respectively).

Year Summary Data
With regard to the second aspect, a significant overlapping between the selected clusters in all three years can be observed. The overlapping surface area is 0.353 km 2 , which makes 51.53 percent of the selected surface area in 2021, 55.15 percent of the selected surface area in 2020, and 26.23 percent of the selected surface area in 2019.

Discussion
In addition to reporting the algorithm results, we discuss the introduced approach from the perspective of other relevant studies inspired by the Gestalt principles. The idea of applying human cognitive judgments reflecting the principles of visual Gestalt perception is not new. E.g., Ref. [29] introduces a clustering algorithm based on local k-dimensional neighbors of each point, allowing for an arbitrary number of clusters and arbitrary clusters shapes. However, their implicit conceptualization of proximity differs from the conceptualization adopted in our study. According to the conceptualization adopted in [29], the pattern of points given in Figure 6i contains three clusters: two large clusters and one "chain" cluster between them. In our approach, the proximity of points (i.e., locations) is defined by means of transitive closure, so the same pattern contains only one cluster (cf. Figure 6ii). More recently, two different approaches to saliency detection in digital images based on the Gestalt principles are proposed in [30,31]. Particularly, related to the Gestalt principle of proximity, these approaches consider color distance between image regions and implicitly include the transitive closure. However, both approaches restrict the selected image regions only to neighbors of a currently salient image region. Even the image segmentation algorithm introduced in [18], on which we build in this contribution, includes a pairwise region comparison predicate. In our approach, this restriction is not present and we comment briefly on this.
In [18], the difference between segments C i and C j is defined as the minimum weight edge connecting them, i.e., where v i and v j are two neighboring pixels (i.e., (v i , v j ) ∈ E) belonging, respectively, to segments C i and C j , and w(v i , v j ) represents the color distance between v i and v j . In our approach, we consider spatial distance between traffic accident locations, but define the distance between two clusters in the same manner. On the other hand, to detect evidence of a boundary between segments C i and C j , the approach introduced in [18] assumes that their difference must be greater than their internal differences Int(C 1 ) and Int(C 2 ): where threshold values are defined as inversely proportional to the size of a segment, i.e., τ(C) ∼ 1 |C| . We relax this condition in our approach: in order to detect evidence of a boundary between two clusters, their distance must be greater than a constant threshold value (cf. Equation (8) in Section 3.2). The justification for this decision is related to the domain of this study. In line with our aim to detect road segments of spatially prolonged traffic accident risk, we do not require stronger evidence for boundary of relatively smaller clusters (and, therefore, do not consider internal cluster differences). The input parameter threshold allows for controlling the scale of observation: a larger threshold value causes a preference for larger clusters.

Conclusions
This paper introduced an approach to automatically detecting and selecting road segments of spatially prolonged and high traffic accident risk, intended for application in circumstances of limited human or technical resources for traffic monitoring and management. It also reported on a prototype system and illustrated its functionality using publicly available real-life data on road traffic accidents that occurred in Belgrade. The approach was positively evaluated in two aspects: (i) comparing the clustering results with the locations of traffic camera poles installed a posteriori; (ii) the stability of results through time.
To conclude, we first reflect on the comprehensiveness of the feature set that represents a traffic accident. Machine-learning-based approaches to traffic accident clustering typically deal with a number of features, including road features (e.g., road, surface, road type, vehicle type, etc.), environmental features (e.g., date, time, weather, etc.), and human features (e.g., participant's age and gender, violation of law, etc.) [3,7,32]. The application of those approaches assumes the existence of a dataset that is rather comprehensive in terms of features. However, the comprehensiveness of available datasets varies between different geographical areas and time periods. In contrast, a traffic accident in our approach is represented by two positional coordinates only, i.e., latitude and longitude, which increases the possibility of its application.
Related to the time complexity of the proposed approach, the clustering algorithm introduced in Section 3.2 represents the dominant component. Its running time can be factored as follows.
Step 1 takes constant time. In Step 2, for a given set containing k traffic accidents, there are k 2 candidate elements for set D (N, τ), i.e., this step takes O(k 2 ) time.
In the example given in Section 4, the number of traffic accidents was k = 4072, which means that approximately k 2 ≈ 16.6 million candidate pairs were considered. However, the number of elements in set D(N, τ), which corresponds to the memory footprint of Step 2, does not necessarily follow this pattern. E.g., set D(N, τ) produced in the example contained only m = 8407 pairs. In general, the size of set D(N, τ) depends on threshold value τ. Finally, it was shown in [18] that Steps 3 and 4 can be implemented in O(m log m) and O(mα(m)) time, where α is the very slow-growing inverse Ackerman's function.