Next Article in Journal
A Note on Eigenvalues and Asymmetric Graphs
Next Article in Special Issue
Measure of Similarity between GMMs Based on Autoencoder-Generated Gaussian Component Representations
Previous Article in Journal
Convergence of Parameterized Variable Metric Three-Operator Splitting with Deviations for Solving Monotone Inclusions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Clustering-Based Approach to Detecting Critical Traffic Road Segments in Urban Areas

1
Ministry of the Interior of the Republic of Serbia, Kneza Miloša 101, 11000 Beograd, Serbia
2
Department of Information Technology, University of Criminal Investigation and Police Studies, Cara Dušana 196, 11080 Beograd, Serbia
3
School of Electrical and Computer Engineering, Academy of Technical and Art Applied Studies, Vojvode Stepe 283, 11000 Beograd, Serbia
4
Faculty of Social Sciences, University Business Academy in Novi Sad, Bulevar Umetnosti 2a, 11070 Beograd, Serbia
*
Author to whom correspondence should be addressed.
Axioms 2023, 12(6), 509; https://doi.org/10.3390/axioms12060509
Submission received: 31 March 2023 / Revised: 18 May 2023 / Accepted: 19 May 2023 / Published: 24 May 2023
(This article belongs to the Special Issue Advances in Numerical Algorithms for Machine Learning)

Abstract

:
This paper introduces a parameter-free clustering-based approach to detecting critical traffic road segments in urban areas, i.e., road segments of spatially prolonged and high traffic accident risk. In addition, it proposes a novel domain-specific criterion for evaluating the clustering results, which promotes the stability of the clustering results through time and inter-period accident spatial collocation, and penalizes the size of the selected clusters. To illustrate the proposed approach, it is applied to data on traffic accidents with injuries or death that occurred in three of the largest cities of Serbia over the three-year period.

1. Introduction

Clustering has an important role in road traffic data analysis. Two research lines currently receive the most attention in the field. The first line is related to traffic logistics, e.g., traffic load and congestion analysis, and vehicle routing. The second line is related to traffic safety, e.g., traffic accident pattern detection, hotspot detection and critical road segment detection. Some recent studies that employ clustering in the context of traffic analysis are summarized in Table 1.
This paper goes along the second research line. It introduces a parameter-free approach to clustering critical traffic road segments in urban areas, i.e., road segments of spatially prolonged and high traffic accident risk. With this respect, we build on and extend the specific approach introduced in [19]. Two traffic accidents are considered related (i.e., as belonging to the same cluster) if the spatial distance between them is less than or equal to a predefined threshold value τ ^ , i.e.,
a i a j d ( a i , a j ) τ ^ .
A road segment is considered to be at spatially prolonged traffic accident risk if it is associated with a set of traffic accidents A = { a 1 , a 2 , , a n } in a given period such that the transitive closure of relation (1) over set A provides a connected graph. A road segment is considered to be at high traffic accident risk when it is associated with a significant number of accidents when compared to other road segments in the given area. In the first phase of the algorithm, clusters are determined by the transitive closure of relation (1), which can be described as follows. Let A be a set of traffic accidents, each of which is described only by its positional coordinates. At the start of the algorithm, each accident a i A is assigned to a separate cluster k ( a i ) . In addition, let X ( A , τ ^ ) be a sequence of all combinations of two traffic accidents whose distance is less than or equal to τ ^ . This sequence is ordered by non-decreasing distance between traffic accidents:
X ( A , τ ^ ) = ( a 11 a 12 ) , ( a 21 a 22 ) , , ( a n 1 a n 2 ) ,
where
( 1 i n ) ( { a i 1 , a i 2 } A d ( a i 1 , a i 2 ) < τ ^ i < j d ( a i 1 , a i 2 ) d ( a j 1 , a j 2 ) ) .
Sequence X ( A , τ ^ ) is iterated from the first to the last ordered pair. For each pair ( a i 1 , a i 2 ) , clusters k ( a i 1 ) and k ( a i 2 ) are merged, i.e.,
f o r e a c h ( a i 1 , a i 2 ) X ( A , τ ) i f k ( a i 1 ) k ( a i 2 ) t h e n m e r g e c l u s t e r s k ( a i 1 ) a n d k ( a i 2 )
In other words, the clusters are merged in a bottom–up manner. In the second phase, the clusters that are dominant in terms of number of accidents are selected as critical. This phase represents an adaptation of the method of threshold selection for image binarization introduced in [20] (pp. 120–121).
It should be noted that the spatial threshold τ ^ was applied as an input hyperparameter to this algorithm. However, in the general case, its optimal value should be adaptively derived depending on a given traffic area. The contribution of this study can be summarized as follows:
  • We introduce an approach to automatic threshold value estimation based on knee-point detection. In general, a knee point in considered the operational point at which the system achieves the trade-off between cost and performance dependent on a tunable parameter. Thus, the traffic accident data are clustered repetitively by varying threshold value τ ^ , and the operational threshold value is selected with respect to the introduced internal evaluation measure. Various knee-point detection algorithms have already been applied to determine the optimal number of clusters, cf. [21,22,23]. However, the criteria for the evaluation of clustering results are usually defined in a domain-independent manner, e.g., based on the within-cluster dispersion, between-cluster dispersion, etc. In contrast to those approaches, this paper proposes a novel domain-specific criterion for evaluating the clustering results, which promotes the stability of clustering results through time and inter-period accident spatial collocation, and penalizes the size of selected clusters.
  • We propose an adaptation of the Kneedle algorithm [24] aimed at the automatic determination of the operational threshold value.
  • In our approach, an urban area (e.g., a city) encompasses a set of possibly diverse administrative units (e.g., municipalities), each of which exercises traffic control jurisdiction over its roads. Thus, the criteria for the determination of critical road segments may differ among different administrative units. One of the novelties of the proposed approach is that traffic analysis is conducted for each administrative unit separately, but the clustering results are evaluated at the level of the entire urban area.
  • For the purpose of illustration, the proposed approach is applied to data on traffic accidents with injuries or death that occurred in three of the largest cities of Serbia over the three-year period [25,26,27], as summarized in Table 2. For each accident, only its unique identification number and positional coordinates are taken into account. In an external validation, the obtained clustering results are positively evaluated with respect to the locations of traffic cameras.
The rest of this paper is structured as follows. Section 2 introduces an evaluation measure for traffic accident clustering. Section 3 proposes an adaptation of the Kneedle algorithm. Section 4 and Section 5 present the results and evaluation of the proposed approach. Section 6 concludes the paper.

2. Evaluation Measure for Traffic Accident Clustering

This section introduces an evaluation measure for traffic accident clustering based on three separate but related submeasures:
  • Stability of clustering results through time;
  • Inter-period accident spatial collocation;
  • Area covered by the selected clusters.

2.1. Stability of Clustering Results through Time

To estimate the stability of clustering results through time, the clustering algorithm is applied to data on traffic accidents collected in the same spatial areas over two different periods, which we denote as P 1 and P 2 , respectively, where P 1 precedes P 2 .
Without loss of generality, let us assume that a city has n municipalities, represented by the vector:
M = m 1 , m 2 , , m n .
The clustering algorithm is applied separately for each municipality. For the given threshold value τ j and municipality m i , the following steps are performed:
1.
The clustering algorithm [19] is applied to data on traffic accidents that occurred in municipality m i over period P 1 .
2.
We calculate the shares (i.e., percentage) of all traffic accidents that occurred in municipality m i over period P 1 and P 2 , respectively, that belong to the clusters selected in Step 1. We denote these shares as s 1 ( m i , τ j , P 1 ) and s 2 ( m i , τ j , P 2 ) .
Example 1.
Let us adopt the following input parameter settings:
  • The municipality of Zvezdara (denoted as m);
  • Threshold value τ = 170 m ;
  • Period P 1 runs from January 2019 to December 2020;
  • Period P 2 runs from January 2021 to December 2021.
The execution of the above algorithm for the adopted parameter settings can be summarized as follows:
1. 
Over period P 1 , 631 traffic accidents with injuries or death occurred in the municipality of Zvezdara. Figure 1a shows a map of all these traffic accidents. When the clustering algorithm is applied on this set of traffic accidents, four clusters are obtained, as shown in Figure 1b.
2. 
The four selected clusters contain 257 traffic accidents. Thus, the share of the traffic accidents that occurred in the municipality over period P 1 that belong to the selected clusters is
s 1 ( m , τ , P 1 ) = 257 631 = 40.729 % .
3. 
Over period P 2 , 317 traffic accidents with injuries or death occurred in the municipality of Zvezdara. Figure 1c shows a map of all these traffic accidents. In addition, each of the clusters given in Figure 1a is represented as the minimum bounding box of its convex hull in Figure 1c. The number of accidents occurred over this period that belong to the areas covered by the clusters selected in Step 1 is 131. The share of the captured traffic accidents is
s 2 ( m , τ , P 2 ) = 131 317 = 41.325 % .
When this sequence of steps is performed for all municipalities in set M, the result can be represented by two vectors:
S 1 ( M , τ j , P 1 ) = s 1 ( m 1 , τ j , P 1 ) , s 1 ( m 2 , τ j , P 1 ) , , s 1 ( m n , τ j , P 1 ) , S 2 ( M , τ j , P 2 ) = s 2 ( m 1 , τ j , P 2 ) , s 2 ( m 2 , τ j , P 2 ) , , s 2 ( m n , τ j , P 2 ) .
In general, municipalities in a city may differ in area, the number of inhabitants, traffic density and other various factors. However, we consider them as being equally important in estimating the stability of clustering results through time. Therefore, for a given threshold value τ j , the stability of clustering results through time is estimated as the cosine similarity between the vectors in Equation (8):
s ( M , τ j , P 1 , P 2 ) = k = 1 n ( s 1 ( m k , τ j , P 1 ) · s 2 ( m k , τ j , P 2 ) ) k = 1 n s 1 2 ( m k , τ j , P 1 ) · k = 1 n s 2 2 ( m k , τ j , P 2 ) .
Since all elements of the vectors in Equation (8) are positive, value s ( M , τ j , P 1 , P 2 ) is always in range [ 0 , 1 ] , where value 1 represents the maximum stability (i.e., the maximum similarity between the vectors), and 0 represents the minimum stability.
Example 2.
We keep the following subset of input parameters adopted in Example 1 and estimate the stability of the clustering results for the city of Belgrade. The results obtained when the above algorithm is applied to traffic accident data collected in all municipalities over periods P 1 and P 2 are given in Table 3. The particular elements of vectors S 1 ( M , τ j , P 1 ) and S 2 ( M , τ j , P 2 ) defined in Equation (8) are given in the fourth and seventh columns of the table. Following Equation (9), the stability of clustering results for the adopted parameter settings is estimated as
s ( M , τ , P 1 , P 2 ) = 0.990 .

2.2. Inter-Period Accident Spatial Collocation

We define the city-level inter-period accident spatial collocation index as the share (i.e., percentage) of all traffic accidents that occurred in city M over period P 2 that belong to the areas covered by the clusters obtained when the clustering algorithm was applied to the set of all traffic accidents with injuries or death occurred in M over period P 1 . This index is denoted as c ( M , τ j , P 1 , P 2 ) .
Example 3.
In the last row of Table 3, the following can be observed:
  • The total number of accidents with injuries or death over period P 2 is 4072.
  • The number of accidents that occurred over this period that belong to the areas covered by the clusters obtained when the clustering algorithm was applied to the set of all traffic accidents with injuries or death occurred over period P 1 is 1588.
The resulting inter-period accident spatial collocation is
c ( M , τ , P 1 , P 2 ) = 1588 4072 = 38.998 % .

2.3. Relative Size of Selected Clusters

In our approach, the area covered by a cluster of traffic accidents is conceptualized as the area of the minimum bounding box of its convex hull (cf. Figure 1c). In line with this conceptualization, we define the relative size of selected clusters as the share of the area of city M covered by the clusters obtained when the clustering algorithm is separately applied to sets of traffic accidents occurred in all municipalities of M over period P 1 . The city-level relative cluster size is denoted as r ( M , τ j , P 1 ) .
Example 4.
Adopting the same input parameter settings as in Example 2, for each municipality, Table 4 provides the number of the selected clusters, the area covered by the selected clusters, the area of the municipality and the municipality-level relative size of the selected clusters. The resulting city-level relative size of selected clusters can be derived from the data given in the last row of Table 4:
r ( M , τ , P 1 ) = 26.502 3231.469 = 0.820 % .

2.4. Integrated Measure for Traffic Accident Clustering

The clustering algorithm introduced in [19] is designed to automatically detect and select critical road segments, intended for application in circumstances of limited human or technical resources for traffic monitoring and management. In line with this, we introduce an integrated measure for traffic accident clustering that promotes the stability of clustering results and inter-period accident spatial collocation index, and penalize the size of selected clusters, i.e., for given city M, threshold value τ j and periods P 1 and P 2 , the integrated measure is defined as
η ( M , τ j , P 1 , P 2 ) = s ( M , τ j , P 1 , P 2 ) · c ( M , τ j , P 1 , P 2 ) r ( M , τ j , P 1 , P 2 ) ,
where we have the following:
  • s ( M , τ j , P 1 , P 2 ) represents the stability of the clustering results;
  • c ( M , τ j , P 1 , P 2 ) represents the inter-period accident spatial collocation index;
  • r ( M , τ j , P 1 , P 2 ) represents the city-level relative size of selected clusters.
Example 5.
Taking (10)–(12) into account, we can calculate the value of the introduced integrated measure:
η ( M , τ , P 1 , P 2 ) = 47.082 .

3. Threshold Selection

For given city M and periods P 1 and P 2 , the integrated measure for traffic accident clustering introduced in the previous section can be considered a function with one input parameter—the threshold value, i.e., η ( τ ) . This reduction allows for applying the clustering algorithm introduced in [19] repetitively on traffic accidents occurred in city M over periods P 1 and P 2 by varying its input threshold value τ . This section introduces an algorithm for the selection of an operational threshold value based on the integrated measure defined in Equation (13).
In our approach, the operating threshold value is indicated by a knee point of the plot of the integrated measure η ( τ ) versus the applied threshold value τ . Thus, we present an approach for knee point detection. Let D ^ be a dataset containing n observations for which a knee point should be detected:
D ^ = { ( τ ^ i , η ^ i ) | 1 i n τ ^ i 0 η ^ i 0 } ,
where τ ^ i represents a threshold value, η ^ i represents the integrated measure value obtained for τ ^ i , and threshold values τ ^ i are evenly spaced, i.e.,
( t R , t > 0 ) ( 1 i < n ) ( τ ^ i + 1 τ ^ i = t ) .
First, values τ ^ i and η ^ i are normalized to range [ 0 , 1 ] without changing the distribution of the data [24], i.e.,
D ¯ = { ( τ ¯ i , η ¯ i ) | τ ¯ i = τ ^ i τ ^ m i n τ ^ m a x τ ^ m i n η ¯ i = η ^ i η ^ m i n η ^ m a x η ^ m i n ( τ ^ i , η ^ i ) D ^ } ,
where
τ ^ m i n = min 1 i n τ ^ i ,                                         η ^ m i n = min 1 i n η ^ i ,
τ ^ m a x = max 1 i n τ ^ i ,                                         η ^ m a x = max 1 i n η ^ i .
To select knee-point candidates, we consider the differences between the normalized dataset points and the linear function f ( τ ) = 1 τ that represent the main diagonal of the unit square to which the original dataset was normalized. Then, a new dataset that captures the difference distribution is derived as follows:
D = { ( τ i , η i ) | τ i = τ ¯ i η i = 1 τ ¯ i η ¯ i ( τ ¯ i , η ¯ i ) D ¯ } .
To select a knee point, we identify the most concave point ( τ , η ) in the curve representing difference distribution D . Thus, similar to [24], a set of knee-point candidates is defined as containing the points of salient concavity, i.e., it is selected by means of local maxima in set D :
K 1 = { ( τ i , η i ) | 1 < i < n η i > η i 1 η i > η i + 1 ( τ i , η i ) D } .
If set K 1 is not empty, the concavity at any point ( τ i , η i ) in the set is estimated as the angle at that point:
γ 1 ( τ i , η i ) = arctan τ i τ i 1 | η i η i 1 | + arctan τ i + 1 τ i | η i + 1 η i | = arctan ( τ i τ i 1 | η i η i 1 | + τ i + 1 τ i | η i + 1 η i | 1 τ i τ i 1 | η i η i 1 | · τ i + 1 τ i | η i + 1 η i | ) , if τ i τ i 1 | η i η i 1 | · τ i + 1 τ i | η i + 1 η i | < 1 , arctan ( τ i τ i 1 | η i η i 1 | + τ i + 1 τ i | η i + 1 η i | 1 τ i τ i 1 | η i η i 1 | · τ i + 1 τ i | η i + 1 η i | ) + π , if τ i τ i 1 | η i η i 1 | · τ i + 1 τ i | η i + 1 η i | > 1 , π 2 , otherwise ,
as illustrated in Figure 2a. The most concave point in set K 1 is selected by minimizing the estimated angle:
( τ * , η * ) = argmin ( τ i , η i ) K 1 γ 1 ( τ i , η i ) .
Otherwise, if set K 1 is empty (i.e., difference distribution D is monotonically decreasing), we relax condition (21). In this case, a set of knee-point candidates is defined as containing all concave points (i.e., not just salient) in the curve representing difference distribution D .
Having in mind that function η ( τ ) is discrete, where τ -values are evenly spaced (cf. Equation (16)), its second derivative can be represented as
2 η ( τ = τ i ) τ 2 = ( η ( τ = τ i ) τ ) τ = η ( τ = τ i + 1 ) 2 · η ( τ = τ i ) + η ( τ = τ i 1 ) t 2 .
Thus, a set of all concave points can be formally represented as a set of points, in which the second derivative is less than zero:
K 2 = { ( τ i , η i ) | 1 < i < n 2 η i > η i 1 η i + 1 ( τ i , η i ) D } ,
which is in line with the angle-based condition applied in [21]. The concavity at any point ( τ i , η i ) in set K 2 is estimated as the angle at that point ( τ i , η i ) :
γ 2 ( τ i , η i ) = arctan τ i τ i 1 | η i η i 1 | arctan τ i + 1 τ i | η i + 1 η i | + π , if η i 1 < η i < η i + 1 , arctan τ i + 1 τ i | η i + 1 η i | arctan τ i τ i 1 | η i η i 1 | + π , if η i 1 > η i > η i + 1 , arctan τ i τ i 1 | η i η i 1 | + π 2 , if η i 1 < η i = η i + 1 , arctan τ i + 1 τ i | η i + 1 η i | + π 2 , if η i 1 = η i > η i + 1 .
The first case in Equation (26) is illustrated in Figure 2b. The most concave point in set K 2 is selected by minimizing the estimated angle:
( τ * , η * ) = argmin ( τ i , η i ) K 2 γ 2 ( τ i , η i ) .
Finally, since value τ * —derived either from Equation (23) or Equation (27)—is normalized (cf. (Equation 17)), to obtain the operational threshold value, it should be denormalized:
τ ^ * = τ ^ m i n + τ * ( τ ^ m a x τ ^ m i n ) .
It is easy to show that the set defined in Equation (25) is always a superset of the set defined in Equation (21). However, these two conditions are considered separately and in the stated particular order for the purpose of algorithm efficiency. The threshold selection algorithm is illustrated in the next section.

4. Results

The proposed approach is applied to a set of 18,880 real-life traffic accidents with injuries or death (cf. Table 2). Following the idea presented in Section 3, we consider the following sequence of threshold values:
T 100 m , 110 m , 120 m , , 400 m ,
among which we select an operational threshold value.
The values of all the measures introduced in Section 2 obtained for the city of Belgrade are given in Table 5. The plot of the normalized integrated measure η versus the applied normalized threshold value τ is given in Figure 3a. In addition, the derived differences between the normalized dataset points and the main diagonal of the unit square are denoted in Figure 3b. The operational threshold value is estimated as
τ ^ * ( M = Belgrade , P 1 = [ 2019 2020 ] , P 2 = [ 2021 ] ) = 150 m .
The differences between the normalized dataset points and the main diagonal of the unit square obtained for the cities of Novi Sad and Niš are denoted in Figure 3c,d, respectively. The operational threshold values are estimated as
τ ^ * ( M = Novi Sad , P 1 = [ 2019 2020 ] , P 2 = [ 2021 ] ) = 170 m , τ ^ * ( M = Ni š , P 1 = [ 2019 2020 ] , P 2 = [ 2021 ] ) = 160 m .
The clustering results obtained by applying the automatically determined threshold values for the considered cities (cf. Equations (30) and (31)) are provided in Table 6, Table 7 and Table 8, respectively, including the numbers of the selected clusters for each municipality, the area covered by the selected clusters, the area of the municipalities, the municipality-level relative size of the selected clusters and the city-level relative size of selected clusters.

5. Discussion

The results reported in the previous section demonstrate the stability of the algorithm results. From Equations (30) and (31), it can be observed that the algorithm computes similar threshold values for all of the considered cities. In addition, the areas of the selected cluster represent just 0.522, 0.069, and 0.063 percent of the city area, respectively, cf. Table 6, Table 7 and Table 8.
However, in order to practically validate the proposed approach, we evaluate the obtained results externally, i.e., with respect to the locations of traffic cameras in Belgrade. The installation of traffic cameras has been a long-term process. However, we decided to consider the locations of traffic cameras in a particular moment, i.e., August 2020 [28], for the following reasons:
  • Data availability: The information on the locations of traffic cameras as of August 2020 can be derived from the publicly available information provided by the Ministry of Interior of the Republic of Serbia [28].
  • External criterion: The locations are determined based on expert analysis performed by a third party and independently of this study. The camera locations are indicative, inter alia, of traffic hotspots and may be used as an external evaluation criterion.
  • Time appropriateness: We consider the locations of traffic cameras in an early installation phase, under the assumption that the installation has started with the most critical traffic hotspots. In addition, the clusters are obtained by applying the clustering algorithm on traffic accidents that occurred over period P 1 . The selected “ground-truth” moment (i.e., August 2020) is close in time to the end of period P 1 .
  • The considered cameras were not put into official use during periods P 1 and P 2 , i.e., they did not influence the traffic participant behavior during these periods.
As of August 2020, 154 camera poles (carrying 392 cameras) were installed in 9 of 17 municipalities in Belgrade. The clustering results for these nine “inner” municipalities and the camera pole locations are illustrated in Figure 4.
The distribution of camera poles and cameras across these municipalities is given in the first three columns of Table 9. The numbers and the shares of camera poles and cameras covered by the clusters obtained by applying the threshold value (30) are given in the last four columns of this table.
The clustering results can be summarized as follows. The total area of the clusters represents only 0.522 percent of the city area (cf. Table 6) and covers 40.26 percent of all camera poles and 35.97 percent of cameras in the city (cf. Table 9). These results may be considered satisfactory, especially keeping in mind our goal to introduce an approach suitable for application in circumstances of limited human or technical resources for traffic monitoring and management. In addition, one of the municipalities (i.e., Novi Beograd, cf. Figure 4a) was more comprehensively covered by cameras. The area of this municipality represents only 1.26 percent of the city area (i.e., 4075.6 km 2 of 3231.469 km 2 , cf. Table 6) but contains 47.40 percent of all camera poles (i.e., 73 of 154 camera poles) and 55.87 percent of all cameras (i.e., 219 of 392, cf. Table 9). The distribution of cameras in this municipality was determined by reasons that were not exclusively related to traffic and thus is not primarily indicative of traffic hotspots. If we exclude this municipality from the consideration, the remaining clusters cover 60.49 percent of all camera poles and 63.58 percent of cameras.

6. Conclusions

This paper introduced a parameter-free approach to traffic accident clustering in urban areas intended for the determination of road segments of spatially prolonged and high traffic accident risk. At the specification level, the proposed algorithm promotes the stability of clustering results through time and inter-period accident spatial collocation, and penalizes the size of the selected clusters. To illustrate the proposed approach, it was applied to data on a set of 18,880 real-life traffic accidents with injuries or death that occurred in three of the largest cities in Serbia over the three-year period.
The reported results demonstrated the stability of the algorithm results, i.e., the algorithm computed similar threshold values for all of the considered cities. In addition, the clustering results obtained for Belgrade were positively evaluated with respect to an external criterion, i.e., with respect to the locations of traffic cameras. The total area of the clusters represents only 0.522 percent of the city area and covers 40.26 percent of all camera poles and 35.97 percent of cameras in the city. Finally, it should be noted that the proposed approach can be applied to any urban area with a hierarchically organized traffic control jurisdiction.

Author Contributions

Conceptualization, M.G.; methodology, I.K. and M.G.; software, I.K., M.G. and N.M.; validation, I.K. and M.G.; formal analysis, I.K., M.G., N.M. and D.J.; investigation, I.K., M.G., N.M. and D.J.; writing—original draft preparation, M.G.; writing—review and editing, I.K. and M.G. All authors have read and agreed to the published version of the manuscript.

Funding

The work of M.G. was partially funded by the Ministry of Education, Science and Technological Development of the Republic of Serbia, under the Research Grants III44008 and TR32035. The work of D.J. was funded by the Ministry of Education, Science and Technological Development of the Republic of Serbia Grant, under the Research Grant No. 337-00-426/2021-09, and by the National Key R&D Program of China under the Research Grant No. 2021YFE0110500.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: [25,26,27,28]. The ArcGIS shapefiles were obtained from the Republic Geodetic Authority, Serbia.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Zhao, Y.; Guo, X.; Su, B.; Sun, Y.; Zhu, Y. Multi-Lane Traffic Load Clustering Model for Long-Span Bridge Based on Parameter Correlation. Mathematics 2023, 11, 274. [Google Scholar] [CrossRef]
  2. Zang, J.; Jiao, P.; Liu, S.; Zhang, X.; Song, G.; Yu, L. Identifying Traffic Congestion Patterns of Urban Road Network Based on Traffic Performance Index. Sustainability 2023, 15, 948. [Google Scholar] [CrossRef]
  3. Shang, Q.; Yu, Y.; Xie, T. A Hybrid Method for Traffic State Classification Using K-Medoids Clustering and Self-Tuning Spectral Clustering. Sustainability 2022, 14, 11068. [Google Scholar] [CrossRef]
  4. Hernández, H.; Alberdi, E.; Pérez-Acebo, H.; Álvarez, I.; García, M.J.; Eguia, I.; Fernández, K. Managing Traffic Data through Clustering and Radial Basis Functions. Sustainability 2021, 13, 2846. [Google Scholar] [CrossRef]
  5. Zhang, Y.; Ye, N.; Wang, R.; Malekian, R. A Method for Traffic Congestion Clustering Judgment Based on Grey Relational Analysis. ISPRS Int. J. Geo-Inf. 2016, 5, 71. [Google Scholar] [CrossRef]
  6. Esenturk, E.; Turley, D.; Wallace, A.; Khastgir, S.; Jennings, P. A data mining approach for traffic accidents, pattern extraction and test scenario generation for autonomous vehicles. Int. J. Transp. Sci. Technol. 2022; in press, corrected proof. [Google Scholar] [CrossRef]
  7. Esenturk, E.; Wallace, A.G.; Khastgir, S.; Jennings, P. Identification of Traffic Accident Patterns via Cluster Analysis and Test Scenario Development for Autonomous Vehicles. IEEE Access 2022, 10, 6660–6675. [Google Scholar] [CrossRef]
  8. Niu, Z.; Wang, Y.; Sun, S. Correlation Analysis of Traffic Accident Factors based on Mean Clustering. In ICCSIE ’22, Proceedings of the 7th International Conference on Cyber Security and Information Engineering, Brisbane Australia, 23–25 September 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 569–575. [Google Scholar] [CrossRef]
  9. Bokaba, T.; Doorsamy, W.; Paul, B.S. Comparative Study of Machine Learning Classifiers for Modelling Road Traffic Accidents. Appl. Sci. 2022, 12, 828. [Google Scholar] [CrossRef]
  10. Wang, D.; Huang, Y.; Cai, Z. A two-phase clustering approach for traffic accident black spots identification: Integrated GIS-based processing and HDBSCAN model. Int. J. Inj. Control. Saf. Promot. 2023; published online. [Google Scholar] [CrossRef]
  11. Li, Y.; Huang, M. Identification of Critical Road Links Based on Static and Dynamic Features Fusion. Appl. Sci. 2023, 13, 5994. [Google Scholar] [CrossRef]
  12. Chen, S.; Cheng, K.; Yang, J.; Zang, X.; Luo, Q.; Li, J. Driving Behavior Risk Measurement and Cluster Analysis Driven by Vehicle Trajectory Data. Appl. Sci. 2023, 13, 5675. [Google Scholar] [CrossRef]
  13. Shah, M.A.; Zeeshan Khan, F.; Abbas, G.; Abbas, Z.H.; Ali, J.; Aljameel, S.S.; Khan, I.U.; Aslam, N. Optimal Path Routing Protocol for Warning Messages Dissemination for Highway VANET. Sensors 2022, 22, 6839. [Google Scholar] [CrossRef] [PubMed]
  14. Rampinelli, A.; Calderón, J.F.; Blazquez, C.A.; Sauer-Brand, K.; Hamann, N.; Nazif-Munoz, J.I. Investigating the Risk Factors Associated with Injury Severity in Pedestrian Crashes in Santiago, Chile. Int. J. Environ. Res. Public Health 2022, 19, 11126. [Google Scholar] [CrossRef] [PubMed]
  15. Lilhore, U.K.; Imoize, A.L.; Li, C.-T.; Simaiya, S.; Pani, S.K.; Goyal, N.; Kumar, A.; Lee, C.-C. Design and Implementation of an ML and IoT Based Adaptive Traffic-Management System for Smart Cities. Sensors 2022, 22, 2908. [Google Scholar] [CrossRef] [PubMed]
  16. Jeong, H.; Kim, I.; Han, K.; Kim, J. Comprehensive Analysis of Traffic Accidents in Seoul: Major Factors and Types Affecting Injury Severity. Appl. Sci. 2022, 12, 1790. [Google Scholar] [CrossRef]
  17. Baek, J. Highway Regional Classification Method Based on Traffic Flow Characteristics for Highway Safety Assessment. Sensors 2022, 22, 86. [Google Scholar] [CrossRef] [PubMed]
  18. Bajada, T.; Attard, M. A typological and spatial analysis of pedestrian fatalities and injuries in Malta. Res. Transp. Econ. 2021, 86, 101023. [Google Scholar] [CrossRef]
  19. Gnjatović, M.; Košanin, I.; Maček, N.; Joksimović, D. Clustering of Road Traffic Accidents as a Gestalt Problem. Appl. Sci. 2022, 12, 4543. [Google Scholar] [CrossRef]
  20. Shih, F.Y. Image Processing and Pattern Recognition: Fundamentals and Techniques; Wiley-IEEE Press: Hoboken, NJ, USA, 2010. [Google Scholar]
  21. Zhao, Q.; Hautamaki, V.; Fränti, P. Knee Point Detection in BIC for Detecting the Number of Clusters. In Advanced Concepts for Intelligent Vision Systems (ACIVS 2008); Blanc-Talon, J., Bourennane, S., Philips, W., Popescu, D., Scheunders, P., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5259, pp. 664–673. [Google Scholar] [CrossRef]
  22. Islam, M.R.; Jenny, I.J.; Nayon, M.; Islam, M.R.; Amiruzzaman, M.; Abdullah-Al-Wadud, M. Clustering algorithms to analyze the road traffic crashes. In Proceedings of the 2021 International Conference on Science & Contemporary Technologies (ICSCT), Dhaka, Bangladesh, 5–7 August 2021. [Google Scholar]
  23. Tibshirani, R.; Walther, G.; Hastie, T. Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B 2001, 63, 411–423. [Google Scholar] [CrossRef]
  24. Satopää, V.; Albrecht, J.; Irwin, D.; Raghavan, B. Finding a “Kneedle” in a Haystack: Detecting Knee Points in System Behavior. In ICDCSW ’11, Proceedings of the 2011 31st International Conference on Distributed Computing Systems Workshops, Washington, DC, USA, 20–24 June 2011; Association for Computing Machinery: Minneapolis, MN, USA, 2011; pp. 166–171. [Google Scholar] [CrossRef]
  25. Republic of Serbia. Data on Traffic Accidents for 2021 for the Territory of all Police Administrations and Municipalities. Available online: https://data.gov.rs/s/resources/podatsi-o-saobratshajnim-nezgodama-po-politsijskim-upravama-i-opshtinama/20220125-085458/nez-opendata-2021-20220125.xlsx (accessed on 1 March 2022).
  26. Republic of Serbia. Data on Traffic Accidents for 2020 for the Territory of all Police Administrations and Municipalities. Available online: https://data.gov.rs/s/resources/podatsi-o-saobratshajnim-nezgodama-po-politsijskim-upravama-i-opshtinama/20210208-095135/nez-opendata-2020-20210125.xlsx (accessed on 1 March 2022).
  27. Republic of Serbia. Data on Traffic Accidents for 2019 for the Territory of all Police Administrations and Municipalities. Available online: https://data.gov.rs/s/resources/podatsi-o-saobratshajnim-nezgodama-po-politsijskim-upravama-i-opshtinama/20200127-133136/nez-opendata-2019-20200125.xlsx (accessed on 1 March 2022).
  28. Ministry of Interior, Republic of Serbia. List of Locations of Video Surveillance System Camera Sites in the City of Belgrade. Available online: http://www.mup.rs/wps/wcm/connect/56a5cf77-df71-440a-a5bd-0d5f92ec8336/lat-Tabela-prelazi.pdf?MOD=AJPERES&CVID=ng1rX1 (accessed on 12 March 2023). (In Serbian).
Figure 1. (a) All traffic accidents with injuries or death that occurred in the municipality of Zvezdara over period P 1 . (b) Four obtained clusters. (c) All traffic accidents with injuries or death that occurred in the municipality of Zvezdara over period P 2 . In addition, for each cluster of traffic accidents in (b), the minimum bounding box of its convex hull is represented in (c). The maps were generated using the ArcMap component of the Esri’s ArcGIS suite (https://www.esri.com, accessed on 1 March 2023).
Figure 1. (a) All traffic accidents with injuries or death that occurred in the municipality of Zvezdara over period P 1 . (b) Four obtained clusters. (c) All traffic accidents with injuries or death that occurred in the municipality of Zvezdara over period P 2 . In addition, for each cluster of traffic accidents in (b), the minimum bounding box of its convex hull is represented in (c). The maps were generated using the ArcMap component of the Esri’s ArcGIS suite (https://www.esri.com, accessed on 1 March 2023).
Axioms 12 00509 g001
Figure 2. (a) A salient concavity (i.e., a local maximum point) and (b) non-salient concavity in difference distribution D .
Figure 2. (a) A salient concavity (i.e., a local maximum point) and (b) non-salient concavity in difference distribution D .
Axioms 12 00509 g002
Figure 3. (a) The plot of the normalized integrated measure η versus the applied normalized threshold value τ for Belgrade. (bd) The plots of differences between the normalized dataset points and the main diagonal of the unit square for Belgrade, Novi Sad and Niš, respectively.
Figure 3. (a) The plot of the normalized integrated measure η versus the applied normalized threshold value τ for Belgrade. (bd) The plots of differences between the normalized dataset points and the main diagonal of the unit square for Belgrade, Novi Sad and Niš, respectively.
Axioms 12 00509 g003
Figure 4. Nine “inner” municipalities of the city of Belgrade with traffic cameras as of August 2020. For each municipality, the camera pole locations and the clusters obtained by applying the threshold value τ = 150 m are indicated. Each cluster of traffic accidents in represented by a minimum bounding box of its convex hull. The maps were generated using the ArcMap component of the Esri’s ArcGIS suite (https://www.esri.com, accessed on 1 March 2023).
Figure 4. Nine “inner” municipalities of the city of Belgrade with traffic cameras as of August 2020. For each municipality, the camera pole locations and the clusters obtained by applying the threshold value τ = 150 m are indicated. Each cluster of traffic accidents in represented by a minimum bounding box of its convex hull. The maps were generated using the ArcMap component of the Esri’s ArcGIS suite (https://www.esri.com, accessed on 1 March 2023).
Axioms 12 00509 g004
Table 1. Summary of some recent studies that employ clustering in the context of traffic safety.
Table 1. Summary of some recent studies that employ clustering in the context of traffic safety.
Ref.TaskClustering Approach
[1]traffic load analysisimproved k-means clustering algorithm
[2]traffic congestion analysisself-organizing maps neural network
[3]traffic state classificationk-medoids algorithm
[4]road network level identificationk-means algorithm
[5]traffic congestion analysisgrey relational clustering model
[6]traffic accidents and pattern extractionROCK algorithm
[7]traffic accident pattern identificationCOOLCAT algorithm
[8]traffic accident factor analysisk-means algorithm
[9]road traffic accident modelinga comparative study of machine learning classifiers
[10]traffic accident black spots identificationHDBSCAN algorithm
[11]traffic congestion analysisk-means algorithm
[12]driving behavior risk analysisk-means algorithm
[13]optimal path routinga modified K-medoids algorithm
[14]analysis of pedestrian crash fatalities and severe injuriesKDE method
[15]traffic-management systemDBSCAN agorithm
[16]severity of traffic accident analysisDBSCAN algorithm
[17]highway safety assessmentk-means algorithm
[18]pedestrian crash severity analysisKDE method
[19]detection of road segments of spatially prolonged and high traffic accident riska clustering algorithm based on the Gestalt principle of proximity
Table 2. Traffic accidents with injuries or death.
Table 2. Traffic accidents with injuries or death.
City201920202021Total
Beograd46843720407212,476
Novi Sad1710146415744748
Niš6075215281656
Total70015705617418,880
Table 3. Estimating the stability of clustering results through time. The algorithm is applied to traffic accident data collected in Belgrade over periods P 1 and P 2 and for the arbitrarily selected threshold value τ ^ = 170 m . All decimal numbers are rounded to three decimal places.
Table 3. Estimating the stability of clustering results through time. The algorithm is applied to traffic accident data collected in Belgrade over periods P 1 and P 2 and for the arbitrarily selected threshold value τ ^ = 170 m . All decimal numbers are rounded to three decimal places.
Municipality P 1 = [ 2019 2020 ] P 2 = [ 2021 ]
# Accidents# Selected
Accidents
Share [%]
s 1 ( m 1 , τ j )
# Accidents# Selected
Accidents
Share [%]
s 2 ( m 1 , τ j )
Barajevo1235443.902512141.176
Grocka29510535.5931504026.667
Lazarevac2819834.8751233125.203
Mladenovac2065928.6411133026.549
Novi beograd112640135.61353718634.637
Obrenovac37612934.3092205926.818
Palilula96230832.01738413234.375
Rakovica29710535.3541414431.206
Savski venac57231154.37127015557.407
Sopot1002828.000491020.408
Stari grad31626383.22817614582.386
Surčin2318637.2291253024.000
Voždovac87932436.86044717739.597
Vračar41227266.01918213976.374
Zemun80025732.12539312632.061
Zvezdara63125740.72931713141.325
Čukarica79725832.37139413233.503
Total8404331539.4464072158838.998
Table 4. Relative sizes of all selected clusters in Belgrade over period P 1 and for the arbitrarily selected threshold value τ ^ = 170 m . All decimal numbers are rounded to three decimal places.
Table 4. Relative sizes of all selected clusters in Belgrade over period P 1 and for the arbitrarily selected threshold value τ ^ = 170 m . All decimal numbers are rounded to three decimal places.
Municipality# Selected
Clusters
Area of Selected
Clusters [km 2 ]
Municipality
Area [km 2 ]
Relative Cluster
Size [%]
Barajevo190.011212.8310.005%
Grocka220.072299.3490.024%
Lazarevac210.123382.5400.032%
Mladenovac90.151338.7640.045%
Novi Beograd17.05740.75617.316%
Obrenovac30.644409.5880.157%
Palilula24.563450.3511.013%
Rakovica70.22230.0250.739%
Savski venac22.32114.08216.484%
Sopot110.003270.5060.001%
Stari grad12.2325.37641.527%
Surčin200.053288.3030.018%
Voždovac32.233148.4091.505%
Vračar12.4242.91183.256%
Zemun81.500149.6821.002%
Zvezdara41.61231.0875.186%
Čukarica61.280156.9090.815%
Total14026.5023231.4690.820%
Table 5. The measures obtained when the introduced algorithm is applied for each threshold value in sequence (29) to traffic accidents occurred in Belgrade over periods P 1 and P 2 . All decimal numbers are rounded to three decimal places.
Table 5. The measures obtained when the introduced algorithm is applied for each threshold value in sequence (29) to traffic accidents occurred in Belgrade over periods P 1 and P 2 . All decimal numbers are rounded to three decimal places.
Threshold
Value
Stability of
Clustering
Results
Relative Size of
Selected
Clusters
Inter-Period
Spatial
Collocation
Integrated
Measure for
Traffic Accident
Clustering
τ ^ s ( τ ^ ) r ( τ ^ ) c ( τ ^ ) η ^ ( τ ^ )
1000.9800.0010.297302.465
1100.9820.0010.286220.035
1200.9750.0020.327176.388
1300.9800.0020.319127.509
1400.9810.0040.36894.753
1500.9780.0050.32961.536
1600.9870.0070.36354.953
1700.9900.0080.39047.082
1800.9910.0100.41640.827
1900.9920.0110.44038.685
2000.9930.0130.45435.308
2100.9940.0140.46633.419
2200.9930.0150.46131.504
2300.9940.0170.47727.453
2400.9950.0190.51026.685
2500.9960.0220.53224.345
2600.9960.0240.56923.539
2700.9960.0250.59123.275
2800.9970.0260.58722.600
2900.9970.0280.59421.453
3000.9970.0290.58920.412
3100.9970.0300.60119.732
3200.9980.0350.62917.872
3300.9980.0360.63717.489
3400.9980.0370.64117.186
3500.9980.0370.64217.118
3600.9990.0380.64717.047
3700.9990.0400.66116.406
3800.9990.0410.66716.178
3900.9990.0450.68215.263
4000.9990.0480.69014.371
Table 6. The clustering results for Belgrade ( τ ^ * = 150 m ). All decimal numbers are rounded to three decimal places.
Table 6. The clustering results for Belgrade ( τ ^ * = 150 m ). All decimal numbers are rounded to three decimal places.
Municipality# Selected
Clusters
Area of Selected
Clusters [km 2 ]
Municipality
Area [km 2 ]
Relative Cluster
Size [%]
Barajevo190.008197212.8310.004%
Grocka570.045736299.3490.015%
Lazarevac210.085380382.5400.022%
Mladenovac140.049522338.7640.015%
Novi Beograd13.57502040.7568.772%
Obrenovac30.516352409.5880.126%
Palilula24.257652450.3510.945%
Rakovica60.13219530.0250.440%
Savski Venac21.85543314.08213.176%
Sopot110.003422270.5060.001%
Stari Grad11.1967005.37622.260%
Surčin410.035323288.3030.012%
Vozdovac41.324443148.4090.892%
Vracar21.1713922.91140.240%
Zemun80.653254149.6820.436%
Zvezdara61.06403631.0873.423%
Čukarica50.898991156.9090.573%
Total20316.8733231.4690.522%
Table 7. The clustering results for Novi Sad ( τ ^ * = 170 m ). All decimal numbers are rounded to three decimal places.
Table 7. The clustering results for Novi Sad ( τ ^ * = 170 m ). All decimal numbers are rounded to three decimal places.
Municipality# Selected
Clusters
Area of Selected
Clusters [km 2 ]
Municipality
Area [km 2 ]
Relative Cluster
Size [%]
Bač60.010367.2680.003%
Bačka Palanka190.052589.4960.009%
Bački Petrovac70.001158.2570.000%
Beočin80.001184.1050.001%
Bečej210.036486.1960.007%
Novi Sad22.523698.8160.361%
Srbobran130.005283.9390.002%
Sremski Karlovci80.00150.5380.002%
Temerin140.031169.5250.019%
Titel10.000260.6000.000%
Vrbas100.131375.3260.035%
Žabalj180.003399.5660.001%
Total1272.7944023.6330.069%
Table 8. The clustering results for Niš ( τ ^ * = 160 m ). All decimal numbers are rounded to three decimal places.
Table 8. The clustering results for Niš ( τ ^ * = 160 m ). All decimal numbers are rounded to three decimal places.
Municipality# Selected
Clusters
Area of Selected
Clusters [km 2 ]
Municipality
Area [km 2 ]
Relative Cluster
Size [%]
Aleksinac60.116706.3350.016%
Doljevac100.001121.2750.001%
Gadžin Han10.000324.9310.000%
Merošina20.000193.0890.000%
Niš21.594449.9290.354%
Niška Banja50.002146.1850.001%
Ražanj20.000288.5120.000%
Svrljig60.001496.8940.000%
Total341.7142727.1510.063%
Table 9. External validation results.
Table 9. External validation results.
Municipality# Camera Poles# Cameras# Covered
Camera Poles
Share of
Covered Camera
Poles [%]
# Covered
Cameras
Share of
Covered
Cameras [%]
Novi Beograd732191317.813114.16
Palilula717457.141058.82
Savski venac1320969.231575.00
Stari grad15421280.003583.33
Vračar510480.00990.00
Voždovac1419857.141157.89
Zemun2457937.502238.60
Zvezdara262100.006100.00
Čukarica121100.002100.00
Total1543926240.2614135.97
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Košanin, I.; Gnjatović, M.; Maček, N.; Joksimović, D. A Clustering-Based Approach to Detecting Critical Traffic Road Segments in Urban Areas. Axioms 2023, 12, 509. https://doi.org/10.3390/axioms12060509

AMA Style

Košanin I, Gnjatović M, Maček N, Joksimović D. A Clustering-Based Approach to Detecting Critical Traffic Road Segments in Urban Areas. Axioms. 2023; 12(6):509. https://doi.org/10.3390/axioms12060509

Chicago/Turabian Style

Košanin, Ivan, Milan Gnjatović, Nemanja Maček, and Dušan Joksimović. 2023. "A Clustering-Based Approach to Detecting Critical Traffic Road Segments in Urban Areas" Axioms 12, no. 6: 509. https://doi.org/10.3390/axioms12060509

APA Style

Košanin, I., Gnjatović, M., Maček, N., & Joksimović, D. (2023). A Clustering-Based Approach to Detecting Critical Traffic Road Segments in Urban Areas. Axioms, 12(6), 509. https://doi.org/10.3390/axioms12060509

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop