In this section, a method is proposed to reduce the average time for ambulances to pick up patients, specifically by optimizing the layout of EMS stations. Common models for optimizing geospatial layout [
10] include traditional models and dynamic location models, whereas traditional models include coverage models and P-median models [
4], etc., and dynamic location models have spatial layout site selection and spatial relocation. The difference between the two in the dynamic location model is that in the dynamic layout problem, the decision maker chooses to build a new facility of a certain size at a new candidate point to satisfy the needs of the corresponding moment or time period; whereas in the relocation problem, the decision maker chooses an original supply point and adjusts the size of the service facility to satisfy the needs of the corresponding moment or time period through global scheduling.
This paper aims to relocate EMS stations by analyzing the latitude and longitude of patient calls in a historical dataset. The traditional model needs to abstract the location information of calls for help into demand points, then analyze all demand points, and, finally, obtain the optimal layout information of the EMS station. However, the amount of data in this article reaches hundreds of thousands, and using traditional methods would make the problem more complex. Therefore, this paper chooses to use clustering algorithms to solve the optimization problem of EMS station layout.
Compared with other models, clustering algorithms have a faster calculation speed and can divide patients into several categories by analyzing longitude and latitude information, as well as adjusting the k value to obtain the optimal solution. It has a certain degree of flexibility, so clustering algorithms are used to optimize the location information of EMS stations.
By comparing the performance of common clustering algorithms in the dataset of this paper, this paper chooses to use the k-means clustering algorithm as an analytical tool. By using this algorithm to perform cluster analysis on the patient call data in Shanghai for one year, k clusters can be obtained. The center points of these clusters will represent the ideal location information of EMS stations. At the same time, in the k-means clustering algorithm, choosing the appropriate K value (that is, the number of clusters) is a crucial step that directly affects the quality of the clustering results. Therefore, this article chooses to combine the k-means clustering algorithm with the elbow method and the silhouette coefficient method to find the optimal k value and, finally, obtain the optimal layout of EMS stations.
2.4.1. Comparison of Common Clustering Algorithms
Common clustering algorithms include k-means clustering algorithm [
26], DBSCAN and hierarchical clustering algorithm. The k-means clustering algorithm is the most classical and commonly used method; it is a simple and efficient algorithm that is widely used in various fields of scientific research.
A comparative analysis was conducted on these three clustering algorithms using the dataset presented in this article. By observing the average distance between samples and clustering centers under different clustering kernels and the clustering results, the optimal algorithm was selected. Due to DBSCAN being a density-based clustering algorithm, the number of clustering kernels can only be observed by adjusting the neighborhood and core points. The comparison results are shown in
Table 4.
When the clustering kernel is 4, the performance of the three algorithms is shown in
Figure 5. The circles in the figure represent the patient’s emergency address, and different colors represent different clusters after clustering.
From
Table 4 and
Figure 5, it can be seen that the k-means clustering algorithm and hierarchical clustering algorithm perform well, but DBSCAN performs poorly. From
Table 4, it can be seen that the DBSCAN clustering algorithm has a longer average distance from the sample to the cluster center under different clustering kernels, which increases the average response time. Moreover, the data in the dataset are evenly distributed, while the DBSCAN clustering algorithm is based on density and is more suitable for datasets with distinct and irregular shapes; for example, as shown in
Figure 6. The circles in the figure represent different data points, and different colors represent different clusters.
For the k-means clustering algorithm and hierarchical clustering algorithm, hierarchical clustering constructs a tree-like hierarchical structure based on the hierarchical structure of the data. In this process, each step requires distance calculation and merging operations on a large number of data point pairs, which consumes a lot of computing resources and time in the case of large datasets. The time complexity is usually between
and
(
is the number of data points). The k-means clustering algorithm, on the other hand, does not care about hierarchical structure and only focuses on the distance between data points and cluster centers. Its calculation is relatively simple and only updates k cluster centers in each iteration, resulting in relatively less computational complexity. It has low time complexity and is suitable for large datasets, with a time complexity of
(where
is the number of data points,
is the number of clusters, and
is the number of iterations). Taking the dataset in this article as an example, the running time results of k-means clustering and hierarchical clustering algorithms are shown in
Table 5.
According to
Table 5, it can be seen that the running time of the k-means clustering algorithm is about 7 s shorter than that of the hierarchical clustering algorithm, which proves that the k-means clustering algorithm performs better than the hierarchical clustering algorithm.
Based on the comparison results of several groups, this paper chooses to use the k-means clustering algorithm to optimize the layout of EMS stations.
2.4.2. K-Means Clustering Algorithm
The k-means clustering algorithm is an unsupervised machine learning algorithm [
27] that only requires data without labeling results. It is also an iterative algorithm with low computational complexity, fast speed, and low computational cost, suitable for large-scale datasets. The k-means clustering algorithm can divide samples into several categories based on the intrinsic relationships between data without knowing any sample labels, resulting in high similarity between samples of the same category and low similarity between samples of different categories [
28]. The k-means algorithm assigns data points to k cluster centers, each representing an optimized EMS station. This makes the results of optimizing the layout of EMS stations much clearer and more intuitive, which is helpful for practical operation and decision-making.
The specific steps of k-means clustering algorithm:
Select k cluster centers as the initial cluster centers;
Calculate the distance between each sample point and each cluster center separately, and assign each sample to the cluster center closest to it;
Update the cluster center of each cluster, which is defined as the mean of all samples in each dimension within the cluster;
Compared with the k cluster centers obtained last time, if the cluster centers change, proceed to step 2; otherwise, proceed to step 5;
When the class center no longer changes, stop iteration and output clustering results.
The parameter k in the k-means algorithm is difficult to determine and usually needs to be specified in advance [
29], obtained based on empirical values, or obtained through multiple experiments. For different datasets, there is no reference for the k value. Setting different k values, the number of iterations may be very different, affecting the accuracy of clustering results. Therefore, the first step to obtain good clustering results is to determine the optimal number of clusters [
30]. The elbow method and silhouette coefficient are common methods used to determine the optimal number of k clusters in k-means clustering algorithms [
31].
The silhouette coefficient can be used to evaluate the clustering results in a way that measures the tightness and separation of the clusters. Silhouette coefficient is a commonly used metric for contour analysis, which takes values ranging from −1 to 1 and indicates the closeness of a sample within the cluster to which it belongs and its separation from other clusters. For each sample in the cluster, their silhouette coefficients are calculated separately; the calculation of silhouette coefficients is shown in Equation (1).
Here, represents the degree of dissimilarity within the cluster, which is the average distance between the current sample and other samples within the cluster; represents the degree of inter cluster dissimilarity, which is the average distance from the current sample to the nearest samples in other clusters. The average silhouette coefficient of all samples is called the silhouette coefficient of the clustering result. The closer is to 1, the closer the sample is within its own cluster, and it is well separated from other clusters. The closer is to −1, the less tight the sample is within its own cluster, and it overlaps with the boundaries of other clusters. If is close to 0, it indicates that the sample is near the cluster boundary. When applying the silhouette coefficient analysis method, the number of clusters with the highest silhouette coefficient is usually selected as the optimal number of clusters.
By calculating the silhouette coefficients of the samples, the quality of clustering results can be evaluated, and the optimal number of clusters can be selected based on the silhouette coefficients. The silhouette coefficient analysis method provides a quantitative evaluation method when selecting the number of clusters, which can help us make better clustering decisions.
This method evaluates the quality of the clusters based on the sum of squared errors (SSE) within the clusters. The location where the SSE improvement effect decreases is the elbow, which is usually used to determine the optimal k value. The SSE objective function is as follows.
Among them, is the standard Euclidean distance, is the number of clusters, is the sample point in cluster , and is the centroid of cluster . The k-means algorithm uses SSE as the objective function to measure clustering quality. The of each cluster is equal to the sum of squared distances from each sample point to its class center, and the sum of of all clusters is the SSE of the clustering result. If members within a class are more compact with each other, the SSE will be smaller. Conversely, if members within a class are more dispersed with each other, the SSE of the class will be larger.
When k is less than the optimal number of clusters, the increase in k will increase the degree of aggregation of each cluster, so the decrease in SSE will be significant. When k reaches the optimal number of clusters, the degree of aggregation obtained by increasing k will rapidly decrease, so the decrease in SSE will sharply decrease and then tend to flatten with the increase of k value. That is to say, the relationship graph between SSE and k is in the shape of an elbow, and the k value corresponding to this elbow is the optimal number of clusters for the data.
Figure 7 shows a line graph using the elbow method, from which it can be seen that the curve starts to flatten out from the point where
, which leads to the conclusion that
is an “elbow point”, i.e., the optimal value of k to look for.