Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering
Abstract
:1. Introduction
1.1. Problem Statement
1.2. State of the Art
1.3. Research Gap
1.4. Our Contribution
- (a)
- The choice of parameter r value (the number of excessive centroids, see above) of the greedy agglomerative heuristic procedure significantly affects the efficiency of the procedure.
- (b)
- Since it is hardly possible to determine the optimal value of this parameter based on such numerical parameters of the k-means problem as the number of data vectors and the number of clusters, reconnaissance (exploratory) search with various values of r can be useful.
- (c)
- Unlike the well-known VNS algorithms that use greedy agglomerative heuristic procedures with an increasing value of the parameter r, a gradual decrease in the value of this parameter may be more effective.
1.5. Structure of this Article
2. Materials and Methods
2.1. The Simplest Approach
Algorithm 1.Lloyd(S) |
Require: Set of initial centroids S = {X1, …, Xk}. If S is not given, then the initial centroids are selected randomly from the set of data vectors {A1, …, AN}. |
repeat |
1. For each centroid Xj, , define its cluster in accordance with (2); // I.e. assign each data vector to the nearest centroid |
2. For each cluster Cj, , calculate its centroid as follows: |
until all centroids stay unchanged. |
2.2. Local Search in SWAP Neighborhoods
2.3. Agglomerative Approach and GREEDYr Neyborhoods
Algorithm 2.BasicGreedy(S) |
Require: Set of initial centroids S = {X1, …, XK}, K > k, required final number of centroids k. |
while |S| > k do |
for do |
end for |
Select a subset of rtoremove centroids with the minimum values of the corresponding |
variables Fi; // By default, rtoremove = 1. |
end while. |
Algorithm 3.Greedy (S,S2,r) |
Require: Two sets of centroids S, S2, |S| = |S2| = k, the number of centroids r of the solution S2 which are used to obtain the resulting solution, . |
Fordo |
1. Select a subset . |
2. |
3. if SSE(S’) < SSE(S) then end if; |
end for |
returnS. |
2.4. Variable Neighborhood Search
Algorithm 4.VND(S) |
Require: Initial solution S, selected neighborhoods nl, . |
repeat |
; |
whiledo |
search for |
if f(S’) < f(S) then else end if; |
end while; |
until the stop conditions are satisfied. |
Algorithm 5.RVNS(S) |
Require: Initial solution S, selected neighborhoods nl, . |
repeat |
; |
While do |
select randomly |
if f(S’) < f(S) then else end if; |
end while; |
until the stop conditions are satisfied. |
2.5. New Algorithm
Algorithm 6.DecreaseGreedySearch(S,r0) |
Require: Initial solution S, initial . |
select randomly , ||= k; |
repeat |
nrepeats max{1,}; |
for do |
1. select randomly |
2. |
3. if SSE(S’) < SSE(S) then end if; |
endfor; |
select randomly , || = k; |
if Steps 1–3 have not changed S |
then |
if then else end if; |
end if; |
until the stop conditions are satisfied (time limitation). |
Algorithm 7.AdaptiveGreedy (S) solver |
Require: the number of reconnaissance search iterations nrecon. select randomly , |S| = k; |
fordo |
select randomly , || = k; |
end for; |
r |
repeat |
; nrepeats max{1,}; |
for do |
for do |
if SSE(S’) < SSE() then end if; |
end for; |
end for; |
; |
until; |
select the value r with minimum value of SSE(); |
; |
DecreaseGreedySearch(). |
2.6. CUDA Implementation
Algorithm 8.CUDA kernel implementation of Step 1 in Lloyd’s procedure (Algorithm 1) |
ifi > N then return end if; Dnearest // distance from Ai to the nearest centroid for do if then Dnearest n end if end for; sumn sumn + An; countern countern + 1; SSE SSE+ // objective function adder |
Algorithm 9.CUDA kernel implementation of calculating Fi ← SSE() in BasicGreedy procedure (Algorithm 2) |
Require: index i of centroid being eliminated. if l > N then return end if; Dnearest // distance from Al to the nearest centroid except Xi for do if then Dnearest end if end for; Fi ← Fi + ; |
2.7. Benchmarking Data
- (a)
- Individual household electric power consumption (IHEPC)—energy consumption data of households during several years (more than 2 million data vectors, 7 dimensions), 0–1 normalized data, “date” and “time” columns removed;
- (b)
- BIRCH3 [121]: one hundred of groups of points of random size on a plane (105 data vectors, 2 dimensions);
- (c)
- S1 data set: Gaussian clusters with cluster overlap (5000 data vectors, 2 dimensions);
- (d)
- Mopsi-Joensuu: geographic locations of users (6014 data vectors, 2 dimensions) in Joensuu city;
- (e)
- Mopsi-Finland: geographic locations of users (13,467 data vectors, 2 dimensions) in Finland.
2.8. Computational Environment
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
NP | Non-deterministic polynomial-time |
MSSC | Minimum Sum-of-Squares Clustering |
SSE | Sum of Squared Errors |
ALA algorithm | Alternate Location-Allocation algorithm |
VNS | Variable Neighborhood Search |
GA | Genetic Algorithm |
IBC | Information Bottleneck Clustering |
VND | Variable Neighborhood Descent |
RVNS | Randomized Variable Neighborhood Search |
GPU | Graphics Processing Unit |
CPU | Central Processing Unit |
RAM | Random Access Memory |
CUDA | Compute Unified Device Architecture |
IHEPC | Individual Household Electric Power Consumption |
Lloyd-MS | Lloyd’s procedure in a multi-start mode |
J-means-MS | J-Means algorithm in a multi-start mode (SWAP1+Lloyd VND) |
GREEDYr | A neighborhood formed by applying greedy agglomerative procedures with r excessive clusters, and the RVNS algorithm which combines search in such neighborhood with Lloyd’s procedure |
SWAPr | A neighborhood formed by replacing r centroids by data vectors, and the RVNS algorithm which combines search in such neighborhood with Lloyd’s procedure |
GH-VNS1 | VNS algorithm with GREEDYr neighborhoods and GREEDY1 for the initial neighborhood type |
GH-VNS2 | VNS algorithm with GREEDYr neighborhoods and GREEDYrandom for the initial neighborhood type |
GH-VNS3 | VNS algorithm with GREEDYr neighborhoods and GREEDYk for the initial neighborhood type |
GA-1 | Genetic algorithm with the single-point crossover, real-valued genes encoded by centroid positions, and the uniform random mutation |
AdaptiveGreedy | New algorithm proposed in this article |
Appendix A. Results of Computational Experiments
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
Lloyd-MS | 35.5712 | 43.3993 | 39.1185 | 38.7718 | 2.9733 |
j-Means-MS | 18.4076 | 23.7032 | 20.3399 | 19.8533 | 1.8603 |
GREEDY1 | 18.3253 | 27.6990 | 21.4555 | 21.6629 | 3.1291 |
GREEDY2 | 18.3253 | 21.7008 | 19.3776 | 18.3254 | 1.6119 |
GREEDY3 | 18.3145 | 21.7007 | 18.5817 | 18.3254 | 0.9372 |
GREEDY5 | 18.3253 | 21.7007 | 18.5129 | 18.3254 | 0.7956 |
GREEDY7 | 18.3253 | 21.7008 | 18.5665 | 18.3255 | 0.9021 |
GREEDY10 | 18.3253 | 21.7010 | 18.5666 | 18.3255 | 0.9021 |
GREEDY12 | 18.3254 | 21.7009 | 18.5852 | 18.3256 | 0.9362 |
GREEDY15 | 18.3254 | 18.3257 | 18.3255 | 18.3255 | 0.0001 |
GREEDY20 | 18.3254 | 18.3263 | 18.3257 | 18.3257 | 0.0002 |
GREEDY25 | 18.3254 | 18.3257 | 18.3255 | 18.3255 | 0.0001 |
GREEDY30 | 18.3254 | 18.3261 | 18.3258 | 18.3258 | 0.0002 |
GH-VNS1 | 18.3147 | 18.3255 | 18.3238 | 18.3253 | 0.0039 |
GH-VNS2 | 18.3253 | 21.7008 | 19.3776 | 18.3254 | 1.6119 |
GH-VNS3 | 18.3146 | 21.6801 | 18.5634 | 18.3254 | 0.8971 |
SWAP1 (the best of SWAPr) | 18.9082 | 20.3330 | 19.4087 | 18.9967 | 0.6019 |
GA-1 | 18.6478 | 21.1531 | 19.9555 | 19.9877 | 0.6632 |
AdaptiveGreedy | 18.3146 | 18.3258 | 18.3240 | 18.3253 | 0.0037 |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
Lloyd-MS | 23.1641 | 34.7834 | 27.5520 | 27.1383 | 3.6436 |
j-Means-MS | 1.7628 | 31.8962 | 11.1832 | 2.4216 | 11.7961 |
GREEDY1 | 20.6701 | 35.5447 | 28.9970 | 29.2429 | 5.0432 |
GREEDY2 | 2.8264 | 29.0682 | 9.9708 | 5.3363 | 9.6186 |
GREEDY3 | 2.6690 | 10.5998 | 4.1444 | 3.0588 | 2.2108 |
GREEDY5 | 1.9611 | 4.3128 | 2.7385 | 2.7299 | 0.6135 |
GREEDY7 | 2.0837 | 4.6443 | 2.8730 | 2.6358 | 0.7431 |
GREEDY10 | 1.9778 | 3.8635 | 2.5613 | 2.3304 | 0.6126 |
GREEDY12 | 1.7817 | 4.3023 | 2.5639 | 2.2009 | 0.8730 |
GREEDY15 | 1.9564 | 3.1567 | 2.3884 | 2.2441 | 0.3620 |
GREEDY20 | 1.7937 | 3.2809 | 2.4542 | 2.3500 | 0.4746 |
GREEDY25 | 1.9532 | 3.3874 | 2.4195 | 2.2575 | 0.5470 |
GREEDY30 | 1.9274 | 2.4580 | 2.1723 | 2.1458 | 0.2171 |
GREEDY50 | 1.8903 | 9.3675 | 2.8047 | 2.1614 | 2.0838 |
GREEDY75 | 1.7878 | 2.8855 | 2.1775 | 2.0272 | 0.4023 |
GREEDY100 | 1.8021 | 2.2942 | 2.0158 | 1.9849 | 0.1860 |
GH-VNS1 | 2.8763 | 17.1139 | 7.3196 | 4.3341 | 5.7333 |
GH-VNS2 | 2.8264 | 29.0682 | 9.9708 | 5.3363 | 9.6186 |
GH-VNS3 | 1.7643 | 2.7357 | 2.0513 | 1.9822 | 0.2699 |
SWAP3 (the best of rand. SWAPr) | 4.9739 | 23.6572 | 9.0159 | 8.3907 | 4.1351 |
GA-1 | 4.8922 | 19.1543 | 8.5914 | 7.1764 | 4.1096 |
AdaptiveGreedy | 1.7759 | 2.3265 | 1.9578 | 1.9229 | 0.1523 |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
Lloyd-MS | 4.1789 | 14.7570 | 9.1143 | 9.3119 | 3.0822 |
j-Means-MS | 7.0119 | 22.3126 | 14.2774 | 12.6199 | 5.5095 |
GREEDY1 | 7.1654 | 15.3500 | 9.6113 | 9.2176 | 2.5266 |
GREEDY2 | 4.9896 | 14.4839 | 8.9197 | 8.2013 | 3.3072 |
GREEDY3 | 5.8967 | 14.1110 | 8.3260 | 8.0441 | 2.2140 |
GREEDY5 | 2.9115 | 10.2536 | 5.8012 | 5.7305 | 2.2740 |
GREEDY7 | 2.6045 | 7.9868 | 4.4201 | 4.0548 | 1.4841 |
GREEDY10 | 2.5497 | 8.6758 | 4.1796 | 2.9639 | 1.8494 |
GREEDY12 | 2.0753 | 4.7134 | 3.0383 | 2.8777 | 0.8348 |
GREEDY15 | 1.8975 | 8.7890 | 3.8615 | 3.2661 | 1.8064 |
GREEDY20 | 1.1878 | 3.7944 | 2.4577 | 2.4882 | 0.9554 |
GREEDY25 | 1.1691 | 3.5299 | 1.8489 | 1.6407 | 0.7460 |
GREEDY30 | 1.1151 | 4.9425 | 2.3711 | 2.0582 | 1.1501 |
GREEDY50 | 1.3526 | 3.5471 | 1.8635 | 1.7114 | 0.6046 |
GREEDY75 | 1.0533 | 5.5915 | 1.9129 | 1.4261 | 1.2082 |
GREEDY100 | 0.8047 | 2.0349 | 1.2602 | 1.1994 | 0.3811 |
GREEDY150 | 0.6243 | 1.4755 | 0.8743 | 0.8301 | 0.2447 |
GREEDY200 | 0.4555 | 1.0154 | 0.6746 | 0.5882 | 0.2103 |
GREEDY250 | 0.4789 | 1.3368 | 0.7233 | 0.6695 | 0.2164 |
GREEDY300 | 0.5474 | 1.0472 | 0.7228 | 0.6657 | 0.1419 |
GH-VNS1 | 1.6219 | 5.2528 | 3.0423 | 3.1332 | 1.0222 |
GH-VNS2 | 1.2073 | 8.6144 | 3.2228 | 2.3501 | 2.4014 |
GH-VNS3 | 0.4321 | 0.6838 | 0.6024 | 0.6139 | 0.0836 |
SWAP12 (the best of SWAP by median) | 2.6016 | 5.5038 | 3.6219 | 3.3612 | 1.0115 |
SWAP20 (the best of SWAP by avg.) | 2.1630 | 5.1235 | 3.4958 | 3.4076 | 0.8652 |
GA-1 | 5.4911 | 12.6950 | 8.8799 | 7.7181 | 2.5384 |
AdaptiveGreedy | 0.3128 | 0.6352 | 0.4672 | 0.4604 | 0.1026 |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
Lloyd-MS | 4.79217 × 1010 | 6.36078 × 1010 | 5.74896 × 1010 | 5.79836 × 1010 | 3.69760 × 109 |
j-Means-MS | 3.43535 × 1010 | 4.26830 × 1010 | 3.66069 × 1010 | 3.60666 × 1010 | 1.75725 × 109 |
GREEDY1 | 3.43195 × 1010 | 3.70609 × 1010 | 3.51052 × 1010 | 3.48431 × 1010 | 7.42 636 × 108 |
GREEDY2 | 3.43194 × 1010 | 3.49405 × 1010 | 3.44496 × 1010 | 3.44140 × 1010 | 1.64 360 × 108 |
GREEDY3 | 3.43195 × 1010 | 3.49411 × 1010 | 3.44474 × 1010 | 3.44140 × 1010 | 1.71131 × 108 |
GREEDY5 | 3.43195 × 1010 | 3.48411 × 1010 | 3.44663 × 1010 | 3.44141 × 1010 | 1.65153 × 108 |
GREEDY7 | 3.42531 × 1010 | 3.47610 × 1010 | 3.44091 × 1010 | 3.43504 × 1010 | 1.76023 × 108 |
GREEDY10 | 3.42560 × 1010 | 3.48824 × 1010 | 3.45106 × 1010 | 3.43573 × 1010 | 2.36526 × 108 |
GREEDY12 | 3.42606 × 1010 | 3.48822 × 1010 | 3.44507 × 1010 | 3.43901 × 1010 | 1.68986 × 108 |
GREEDY15 | 3.42931 × 1010 | 3.47817 × 1010 | 3.43874 × 1010 | 3.43901 × 1010 | 8.31510 × 107 |
GREEDY20 | 3.42954 × 1010 | 3.48826 × 1010 | 3.44186 × 1010 | 3.43905 × 1010 | 1.28972 × 108 |
GREEDY25 | 3.43877 × 1010 | 3.44951 × 1010 | 3.43982 × 1010 | 3.43907 × 1010 | 2.57320 × 107 |
GREEDY30 | 3.43900 × 1010 | 3.48967 × 1010 | 3.45169 × 1010 | 3.43979 × 1010 | 1.93565 × 108 |
GH-VNS1 | 3.42626 × 1010 | 3.48724 × 1010 | 3.45244 × 1010 | 3.44144 × 1010 | 2.00510 × 108 |
GH-VNS2 | 3.42528 × 1010 | 3.48723 × 1010 | 3.44086 × 1010 | 3.43474 × 1010 | 1.54771 × 108 |
GH-VNS3 | 3.42528 × 1010 | 3.47955 × 1010 | 3.43826 × 1010 | 3.43474 × 1010 | 1.02356 × 108 |
SWAP1 (the best of SWAPr) | 3.43199 × 1010 | 3.55777 × 1010 | 3.46821 × 1010 | 3.46056 × 1010 | 3.22711 × 108 |
GA-1 | 3.48343 × 1010 | 3.81846 × 1010 | 3.65004 × 1010 | 3.64415 × 1010 | 1.00523 × 109 |
AdaptiveGreedy | 3.42528 × 1010 | 3.47353 × 1010 | 3.43385 × 1010 | 3.43473 × 1010 | 1.03984 × 108 |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
Lloyd-MS | 5.41643 × 109 | 6.89261 × 109 | 6.25619 × 109 | 6.24387 × 109 | 3.23827 × 108 |
j-Means-MS | 6.75216 × 108 | 1.38889 × 109 | 8.92782 × 108 | 8.35397 × 108 | 1.86995 × 108 |
GREEDY1 | 4.08445 × 109 | 9.07208 × 109 | 5.89974 × 109 | 5.59903 × 109 | 1.47601 × 108 |
GREEDY2 | 1.11352 × 109 | 2.10247 × 109 | 1.59229 × 109 | 1.69165 × 109 | 2.89625 × 108 |
GREEDY3 | 9.63842 × 108 | 2.15674 × 109 | 1.61490 × 109 | 1.60123 × 109 | 3.06567 × 108 |
GREEDY5 | 9.11944 × 108 | 2.36799 × 109 | 1.66021 × 109 | 1.70448 × 109 | 3.68575 × 108 |
GREEDY7 | 1.17328 × 109 | 2.44476 × 109 | 1.77589 × 109 | 1.80948 × 109 | 2.68354 × 108 |
GREEDY10 | 1.14221 × 109 | 2.00426 × 109 | 1.67586 × 109 | 1.69601 × 109 | 2.14822 × 108 |
GREEDY12 | 9.41133 × 108 | 2.28940 × 109 | 1.59715 × 109 | 1.62288 × 109 | 3.01841 × 108 |
GREEDY15 | 8.86983 × 108 | 2.29776 × 109 | 1.53989 × 109 | 1.43319 × 109 | 3.70138 × 108 |
GREEDY20 | 1.02224 × 109 | 2.11636 × 109 | 1.62601 × 109 | 1.64029 × 109 | 2.45576 × 108 |
GREEDY25 | 9.07984 × 108 | 1.87134 × 109 | 1.42878 × 109 | 1.42864 × 109 | 2.74744 × 108 |
GREEDY30 | 8.44247 × 108 | 2.22882 × 109 | 1.50817 × 109 | 1.56015 × 109 | 3.52497 × 108 |
GREEDY50 | 7.98191 × 108 | 1.68198 × 109 | 1.26851 × 109 | 1.17794 × 109 | 2.67082 × 108 |
GREEDY75 | 6.97650 × 108 | 1.74139 × 109 | 1.16422 × 109 | 1.16616 × 109 | 2.82454 × 108 |
GREEDY100 | 6.55465 × 108 | 1.44162 × 109 | 1.03643 × 109 | 1.09001 × 109 | 1.95246 × 108 |
GREEDY150 | 5.94256 × 108 | 1.45317 × 109 | 8.88898 × 108 | 7.96787 × 108 | 2.33137 × 108 |
GREEDY200 | 5.60885 × 108 | 1.41411 × 109 | 7.96908 × 108 | 7.20282 × 108 | 2.26191 × 108 |
GREEDY250 | 5.58602 × 108 | 1.13946 × 109 | 7.58434 × 108 | 6.81196 × 108 | 1.65511 × 108 |
GREEDY300 | 5.68646 × 108 | 1.41338 × 109 | 7.35067 × 108 | 6.83004 × 108 | 1.76126 × 108 |
GH-VNS1 | 1.40141 × 109 | 2.86919 × 109 | 2.16238 × 109 | 2.10817 × 109 | 3.42105 × 108 |
GH-VNS2 | 8.22679 × 108 | 2.12228 × 109 | 1.40322 × 109 | 1.39457 × 109 | 2.96599 × 108 |
GH-VNS3 | 5.33373 × 108 | 7.29800 × 108 | 5.74914 × 108 | 5.48427 × 108 | 5.05346 × 107 |
SWAP1 (the best of. SWAPr) | 6.69501 × 108 | 9.06507 × 108 | 7.48932 × 108 | 7.35532 × 108 | 6.74846 × 107 |
GA-1 | 4.54419 × 109 | 7.11460 × 109 | 5.67688 × 109 | 5.61135 × 109 | 5.99687 × 108 |
AdaptiveGreedy | 5.27254 × 108 | 7.09410 × 108 | 5.60867 × 108 | 5.38952 × 108 | 4.89257 × 107 |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
Lloyd-MS | 8.13022 × 1013 | 9.51129 × 1013 | 8.96327 × 1013 | 9.06147 × 1013 | 4.84194 × 1012 |
j-Means-MS | 4.14627 × 1013 | 6.25398 × 1013 | 4.78063 × 1013 | 4.55711 × 1013 | 6.89734 × 1012 |
GREEDY1 | 3.73299 × 1013 | 5.64559 × 1013 | 4.13352 × 1013 | 3.90845 × 1013 | 5.19021 × 1012 |
GREEDY2 | 3.71499 × 1013 | 3.72063 × 1013 | 3.71689 × 1013 | 3.71565 × 1013 | 2.44802 × 1010 |
GREEDY3 | 3.71518 × 1013 | 3.72643 × 1013 | 3.71840 × 1013 | 3.71545 × 1013 | 4.12818 × 1010 |
GREEDY5 | 3.71485 × 1013 | 3.72087 × 1013 | 3.71644 × 1013 | 3.71518 × 1013 | 2.22600 × 1010 |
GREEDY7 | 3.71518 × 1013 | 3.72267 × 1013 | 3.71755 × 1013 | 3.71658 × 1013 | 2.24845 × 1010 |
GREEDY10 | 3.71555 × 1013 | 3.72119 × 1013 | 3.71771 × 1013 | 3.71794 × 1013 | 1.90289 × 1010 |
GREEDY12 | 3.71556 × 1013 | 3.72954 × 1013 | 3.71892 × 1013 | 3.71693 × 1013 | 3.91673 × 1010 |
GREEDY15 | 3.71626 × 1013 | 3.72169 × 1013 | 3.71931 × 1013 | 3.71963 × 1013 | 1.86102 × 1010 |
GREEDY20 | 3.71600 × 1013 | 3.72638 × 1013 | 3.72118 × 1013 | 3.72153 × 1013 | 2.69206 × 1010 |
GREEDY25 | 3.72042 × 1013 | 3.72690 × 1013 | 3.72284 × 1013 | 3.72228 × 1013 | 2.14437 × 1010 |
GREEDY30 | 3.72180 × 1013 | 3.73554 × 1013 | 3.72586 × 1013 | 3.72471 × 1013 | 4.33818 × 1010 |
GREEDY50 | 3.72166 × 1013 | 3.76422 × 1013 | 3.73883 × 1013 | 3.73681 × 1013 | 16.1061 × 1010 |
GREEDY75 | 3.72399 × 1013 | 3.84870 × 1013 | 3.76286 × 1013 | 3.74750 × 1013 | 41.6632 × 1010 |
GREEDY100 | 3.72530 × 1013 | 3.91589 × 1013 | 3.80730 × 1013 | 3.84482 × 1013 | 61.9706 × 1010 |
GH-VNS1 | 3.71914 × 1013 | 3.77527 × 1013 | 3.73186 × 1013 | 3.72562 × 1013 | 18.3590 × 1010 |
GH-VNS2 | 3.71568 × 1013 | 3.73791 × 1013 | 3.72116 × 1013 | 3.72051 × 1013 | 6.08081 × 1010 |
GH-VNS3 | 3.71619 × 1013 | 3.73487 × 1013 | 3.72387 × 1013 | 3.72282 × 1013 | 5.96618 × 1010 |
SWAP1 (the best of SWAPr) | 4.28705 × 1013 | 5.48014 × 1013 | 4.82383 × 1013 | 4.75120 × 1013 | 3.90128 × 1012 |
GA-1 | 3.84317 × 1013 | 4.08357 × 1013 | 3.97821 × 1013 | 3.97088 × 1013 | 7.43642 × 1011 |
AdaptiveGreedy | 3.71484 × 1013 | 3.72011 × 1013 | 3.71726 × 1013 | 3.71749 × 1013 | 2.02784 × 1010 |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
Lloyd-MS | 3.49605 × 1013 | 4.10899 × 1013 | 3.74773 × 1013 | 3.77191 × 1013 | 2.32012 × 1012 |
j-Means-MS | 1.58234 × 1013 | 2.02926 × 1013 | 1.75530 × 1013 | 1.70507 × 1013 | 1.43885 × 1012 |
GREEDY1 | 1.48735 × 1013 | 2.63695 × 1013 | 1.71372 × 1013 | 1.60354 × 1013 | 2.98555 × 1012 |
GREEDY2 | 1.31247 × 1013 | 1.45481 × 1013 | 1.37228 × 1013 | 1.36745 × 1013 | 4.01697 × 1011 |
GREEDY3 | 1.34995 × 1013 | 1.49226 × 1013 | 1.39925 × 1013 | 1.39752 × 1013 | 4.85917 × 1011 |
GREEDY5 | 1.33072 × 1013 | 1.45757 × 1013 | 1.39069 × 1013 | 1.38264 × 1013 | 4.46890 × 1011 |
GREEDY7 | 1.34959 × 1013 | 1.49669 × 1013 | 1.41606 × 1013 | 1.41764 × 1013 | 4.92200 × 1011 |
GREEDY10 | 1.31295 × 1013 | 1.42722 × 1013 | 1.35970 × 1013 | 1.35318 × 1013 | 3.70511 × 1011 |
GREEDY12 | 1.32677 × 1013 | 1.49028 × 1013 | 1.35561 × 1013 | 1.33940 × 1013 | 4.44283 × 1011 |
GREEDY15 | 1.32077 × 1013 | 1.41079 × 1013 | 1.34102 × 1013 | 1.33832 × 1013 | 2.16247 × 1011 |
GREEDY20 | 1.31994 × 1013 | 1.43160 × 1013 | 1.35420 × 1013 | 1.34096 × 1013 | 3.43684 × 1011 |
GREEDY25 | 1.31078 × 1013 | 1.37699 × 1013 | 1.33571 × 1013 | 1.33040 × 1013 | 2.16378 × 1011 |
GREEDY30 | 1.32947 × 1013 | 1.45967 × 1013 | 1.37618 × 1013 | 1.36729 × 1013 | 3.92767 × 1011 |
GREEDY50 | 1.32284 × 1013 | 1.38691 × 1013 | 1.34840 × 1013 | 1.33345 × 1013 | 2.70770 × 1011 |
GREEDY75 | 1.30808 × 1013 | 1.33266 × 1013 | 1.31857 × 1013 | 1.31833 × 1013 | 7.22941 × 1010 |
GREEDY100 | 1.30852 × 1013 | 1.32697 × 1013 | 1.31250 × 1013 | 1.31067 × 1013 | 4.94315 × 1010 |
GREEDY150 | 1.30754 × 1013 | 1.31446 × 1013 | 1.30971 × 1013 | 1.30952 × 1013 | 1.82873 × 1010 |
GREEDY200 | 1.30773 × 1013 | 1.31172 × 1013 | 1.30916 × 1013 | 1.30912 × 1013 | 1.08001 × 1010 |
GREEDY250 | 1.30699 × 1013 | 1.31073 × 1013 | 1.30944 × 1013 | 1.30990 × 1013 | 1.18367 × 1010 |
GREEDY300 | 1.30684 × 1013 | 1.31068 × 1013 | 1.30917 × 1013 | 1.30933 × 1013 | 1.21748 × 1010 |
GH-VNS1 | 1.40452 × 1013 | 1.56256 × 1013 | 1.45212 × 1013 | 1.42545 × 1013 | 55.7231 × 1010 |
GH-VNS2 | 1.32287 × 1013 | 1.38727 × 1013 | 1,34654 × 1013 | 1,34568 × 1013 | 2,01065 × 1011 |
GH-VNS3 | 1.30996 × 1013 | 1.31378 × 1013 | 1.31158 × 1013 | 1.31138 × 1013 | 1.44998 × 1010 |
SWAP2 (the best of SWAPr by median) | 2.18532 × 1013 | 3.25705 × 1013 | 2.54268 × 1013 | 2.37312 × 1013 | 3.78491 × 1012 |
SWAP7 (the best of SWAPr by avg.) | 2.24957 × 1013 | 2.86883 × 1013 | 2.46775 × 1013 | 2.47301 × 1013 | 1.51198 × 1012 |
GA-1 | 1.38160 × 1013 | 1.71472 × 1013 | 1.55644 × 1013 | 1.54336 × 1013 | 9.21217 × 1011 |
AdaptiveGreedy | 1.30807 × 1013 | 1.31113 × 1013 | 1.30922 × 1013 | 1.30925 × 1013 | 0.87731 × 1010 |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
Lloyd-MS | 8.91703 × 1012 | 8.91707 × 1012 | 8.91704 × 1012 | 8.91703 × 1012 | 1.31098 × 107 |
j-Means-MS | 8.91703 × 1012 | 14.2907 × 1012 | 12.1154 × 1012 | 13.3667 × 1012 | 2.38947 × 1012 |
GREEDY1 | 8.91703 × 1012 | 13.2502 × 1012 | 9.27814 × 1012 | 8.91703 × 1012 | 1.25086 × 1012 |
GREEDY2 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 0.00000 |
GREEDY3 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 0.00000 |
GREEDY5 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 4.03023 × 105 |
GREEDY7 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 4.87232 × 105 |
GREEDY10 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 5.12234 × 105 |
GREEDY12 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 3.16158 × 105 |
GREEDY15 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 5.01968 × 105 |
GH-VNS1 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 0.00000 |
GH-VNS2 | 8.91703 × 1012 | 8,91703 × 1012 | 8,91703 × 1012 | 8.91703 × 1012 | 0.00000 |
GH-VNS3 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 4.03023 × 105 |
SWAP1 (the best of SWAP) | 8.91703 × 1012 | 8.91709 × 1012 | 8.91704 × 1012 | 8.91703 × 1012 | 8.67594 × 106 |
GA-1 | 8.91703 × 1012 | 8.91707 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 9.04519 × 106 |
AdaptiveGreedy | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 0.00000 |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
Lloyd-MS | 3.94212 × 1012 | 4.06133 × 1012 | 3.99806 × 1012 | 3.99730 × 1012 | 4.52976 × 1010 |
j-Means-MS | 3.96626 × 1012 | 4.40078 × 1012 | 4.12311 × 1012 | 4.07123 × 1012 | 14.81090 × 1010 |
GREEDY1 | 3.82369 × 1012 | 4.19102 × 1012 | 3.91601 × 1012 | 3.88108 × 1012 | 9.82433 × 1010 |
GREEDY2 | 3.74350 × 1012 | 3.76202 × 1012 | 3.75014 × 1012 | 3.74936 × 1012 | 6.10139 × 109 |
GREEDY3 | 3.74776 × 1012 | 3.76237 × 1012 | 3.75455 × 1012 | 3.75456 × 1012 | 5.24513 × 109 |
GREEDY5 | 3.74390 × 1012 | 3.77031 × 1012 | 3.75345 × 1012 | 3.75298 × 1012 | 7.17733 × 109 |
GREEDY7 | 3.74446 × 1012 | 3.77208 × 1012 | 3.75277 × 1012 | 3.75190 × 1012 | 7.40052 × 109 |
GREEDY10 | 3.74493 × 1012 | 3.76031 × 1012 | 3.75159 × 1012 | 3.75185 × 1012 | 5.26553 × 109 |
GREEDY15 | 3.74472 × 1012 | 3.77922 × 1012 | 3.75426 × 1012 | 3.75519 × 1012 | 9.79855 × 109 |
GREEDY20 | 3.75028 × 1012 | 3.76448 × 1012 | 3.75586 × 1012 | 3.75573 × 1012 | 3.97310 × 109 |
GREEDY25 | 3.74770 × 1012 | 3.76224 × 1012 | 3.75500 × 1012 | 3.75572 × 1012 | 4.95370 × 109 |
GREEDY30 | 3.75014 × 1012 | 3.76010 × 1012 | 3.75583 × 1012 | 3.75661 × 1012 | 3.45280 × 109 |
GREEDY50 | 3.74676 × 1012 | 3.77396 × 1012 | 3.76021 × 1012 | 3.75933 × 1012 | 9.09159 × 109 |
GH-VNS1 | 3.74310 × 1012 | 3.76674 × 1012 | 3.74911 × 1012 | 3.74580 × 1012 | 6.99859 × 109 |
GH-VNS2 | 3,75106 × 1012 | 3,77369 × 1012 | 3,75792 × 1012 | 3,75782 × 1012 | 6,67960 × 109 |
GH-VNS3 | 3.75923 × 1012 | 3.77964 × 1012 | 3.76722 × 1012 | 3.76812 × 1012 | 6.00125 × 109 |
SWAP3 (the best of SWAP) | 3.75128 × 1012 | 3.79170 × 1012 | 3.77853 × 1012 | 3.77214 × 1012 | 4.53608 × 109 |
GA-1 | 3.84979 × 1012 | 3.99291 × 1012 | 3.92266 × 1012 | 3.92818 × 1012 | 4.56845 × 1012 |
AdaptiveGreedy | 3.74340 × 1012 | 3.76313 × 1012 | 3.74851 × 1012 | 3.75037 × 1012 | 5.56298 × 109 |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | AVERAGE | Median | Std.dev | |
Lloyd-MS | 12,874.8652 | 12,880.0703 | 12,876.0219 | 12,874.8652 | 2.2952 |
j-Means-MS | 12,874.8652 | 13,118.6455 | 12,984.7081 | 12,962.1323 | 75.6539 |
all GREEDY1-15 (equal results) | 12,874.8633 | 12,874.8633 | 12,874.8633 | 12,874.8633 | 0.0000 |
GH-VNS1 | 12,874.8633 | 12,874.8633 | 12,874.8633 | 12,874.8633 | 0.0000 |
GH-VNS2 | 12,874.8633 | 12,874.8633 | 12,874.8633 | 12,874.8633 | 0.0000 |
GH-VNS3 | 12,874.8633 | 12,874.8633 | 12,874.8633 | 12,874.8633 | 0.0000 |
GA-1 | 12,874.8643 | 12,874.8652 | 12,874.8644 | 12,874.8643 | 0.0004 |
AdaptiveGreedy | 12,874.8633 | 12,874.8633 | 12,874.8633 | 12,874.8633 | 0.0000 |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
Lloyd-MS | 5605.0625 | 5751.1982 | 5671.0820 | 5660.4429 | 54.2467 |
j-Means-MS | 5160.2700 | 6280.6440 | 5496.6539 | 5203.5679 | 493.7311 |
GREEDY1 | 5200.9268 | 5431.3647 | 5287.4101 | 5281.7300 | 77.0460 |
GREEDY2 | 5167.1482 | 5283.3894 | 5171.6509 | 5192.1274 | 7.7203 |
GREEDY3 | 5155.5166 | 5178.4063 | 5166.5360 | 5164.6045 | 8.1580 |
GREEDY5 | 5164.6040 | 5178.4336 | 5170.8829 | 5174.0938 | 6.0904 |
GREEDY7 | 5162.5381 | 5178.1269 | 5168.7218 | 5171.8292 | 6.4518 |
GREEDY10 | 5154.2017 | 5176.4502 | 5162.0460 | 5160.4014 | 7.2029 |
GREEDY12 | 5162.8715 | 5181.0281 | 5166.8952 | 5165.3295 | 6.0172 |
GREEDY15 | 5163.2500 | 5181.1333 | 5167.3385 | 5165.8037 | 5.7910 |
GREEDY20 | 5156.2852 | 5176.6855 | 5166.2013 | 5164.6323 | 7.8749 |
GREEDY25 | 5166.9820 | 5181.8529 | 5175.0317 | 5176.2136 | 6.1471 |
GREEDY30 | 5168.6309 | 5182.4351 | 5175.2414 | 5176.4512 | 6.4635 |
GREEDY50 | 5168.3887 | 5182.4321 | 5177.5249 | 5177.6855 | 5.4437 |
GH-VNS1 | 5155.5166 | 5164.6313 | 5158.6549 | 5157.6812 | 3.7467 |
GH-VNS2 | 5159.8818 | 5176.6855 | 5167.3365 | 5166.9512 | 5.6808 |
GH-VNS3 | 5171.2969 | 5182.4321 | 5175.0468 | 5174.0752 | 3.6942 |
GA-1 | 5215.9521 | 5248.4521 | 5230.2839 | 5226.0386 | 13.2694 |
AdaptiveGreedy | 5153.5640 | 5163.9316 | 5157.0822 | 5155.5198 | 3.6034 |
References
- Berkhin, P. Survey of Clustering Data Mining Techniques; Accrue Software: New York, NY, USA, 2002. [Google Scholar]
- Cormack, R.M. A Review of Classification. J. R. Stat. Soc. Ser. A 1971, 134, 321–367. [Google Scholar] [CrossRef]
- Tsai, C.Y.; Chiu, C.C. A VNS-based hierarchical clustering method. In Proceedings of the 5th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics (CIMMACS’06), Venice, Italy, 20–22 November 2006; World Scientific and Engineering Academy and Society (WSEAS): Stevens Point, WI, USA, 2006; pp. 268–275. [Google Scholar]
- Lloyd, S.P. Least Squares Quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
- MacQueen, J.B. Some Methods of Classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1965 and 27 December 1965–7 January 1966; Volume 1, pp. 281–297. [Google Scholar]
- Drineas, P.; Frieze, A.; Kannan, R.; Vempala, S.; Vinay, V. Clustering large graphs via the singular value decomposition. Mach. Learn. 2004, 56, 9–33. [Google Scholar] [CrossRef] [Green Version]
- Gu, Y.; Li, K.; Guo, Z.; Wang, Y. Semi-supervised k-means ddos detection method using hybrid feature selection algorithm. IEEE Access 2019, 7, 351–365. [Google Scholar] [CrossRef]
- Guo, X.; Zhang, X.; He, Y.; Jin, Y.; Qin, H.; Azhar, M.; Huang, J.Z. A Robust k-Means Clustering Algorithm Based on Observation Point Mechanism. Complexity 2020, 2020, 3650926. [Google Scholar] [CrossRef]
- Milligan, G.W. Clustering validation: Results and implications for applied analyses. In Clustering and Classification; Arabie, P., Hubert, L.J., Soete, G., Eds.; World Scientific: River Edge, NJ, USA, 1996; pp. 341–375. [Google Scholar]
- Steinley, D.; Brusco, M. Choosing the Number of Clusters in K-Means Clustering. Psychol. Methods 2011, 16, 285–297. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Garey, M.; Johnson, D.; Witsenhausen, H. The complexity of the generalized Lloyd—Max problem (Corresp. ) IEEE Trans. Inf. Theory 1982, 28, 255–256. [Google Scholar] [CrossRef]
- Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P. NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 2009, 75, 245–248. [Google Scholar] [CrossRef] [Green Version]
- Cooper, L. Heuristic methods for location-allocation problems. SIAM Rev. 1964, 6, 37–53. [Google Scholar] [CrossRef]
- Jiang, J.L.; Yuan, X.M. A heuristic algorithm for constrained multi-source Weber problem. The variational inequality approach. Eur. J. Oper. Res. 2007, 187, 357–370. [Google Scholar] [CrossRef]
- Arthur, D.; Manthey, B.; Roglin, H. k-Means Has Polynomial Smoothed Complexity. In Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS’09), Atlanta, GA, USA, 25–27 October 2009; IEEE Computer Society: Washington, DC, USA, 2009; pp. 405–414. [Google Scholar] [CrossRef] [Green Version]
- Sabin, M.J.; Gray, R.M. Global convergence and empirical consistency of the generalized Lloyd algorithm. IEEE Trans. Inf. Theory 1986, 32, 148–155. [Google Scholar] [CrossRef]
- Emelianenko, M.; Ju, L.; Rand, A. Nondegeneracy and Weak Global Convergence of the Lloyd Algorithm in Rd. SIAM J. Numer. Anal. 2009, 46, 1423–1441. [Google Scholar] [CrossRef]
- Pham, D.T.; Afify, A.A. Clustering techniques and their applications in engineering. Proceedings of the Institution of Mechanical Engineers, Part C. J. Mech. Eng. Sci. 2007, 221, 1445–1459. [Google Scholar] [CrossRef]
- Fisher, D.; Xu, L.; Carnes, J.R.; Reich, Y.; Fenves, J.; Chen, J.; Shiavi, R.; Biswas, G.; Weinberg, J. Applying AI clustering to engineering tasks. IEEE Expert 1993, 8, 51–60. [Google Scholar] [CrossRef]
- Gheorghe, G.; Cartina, G.; Rotaru, F. Using K-Means Clustering Method in Determination of the Energy Losses Levels from Electric Distribution Systems. In Proceedings of the International Conference on Mathematical Methods and Computational Techniques in Electrical Engineering, Timisoara, Romania, 21–23 October 2010; pp. 52–56. [Google Scholar]
- Kersten, P.R.; Lee, J.S.; Ainsworth, T.L. Unsupervised classification of polarimetric synthetic aperture radar images using fuzzy clustering and EM clustering. IEEE Trans. Geosci. Remote Sens. 2005, 43, 519–527. [Google Scholar] [CrossRef]
- Cesarotti, V.; Rossi, L.; Santoro, R. A neural network clustering model for miscellaneous components production planning. Prod. Plan. Control 1999, 10, 305–316. [Google Scholar] [CrossRef]
- Kundu, B.; White, K.P., Jr.; Mastrangelo, C. Defect clustering and classification for semiconductor devices. In Proceedings of the 45th Midwest Symposium on Circuits and Systems, Tulsa, Oklahoma, 4–7 August 2002; Volume 2, pp. II-561–II-564. [Google Scholar] [CrossRef]
- Vernet, A.; Kopp, G.A. Classification of turbulent flow patterns with fuzzy clustering. Eng. Appl. Artif. Intell. 2002, 15, 315–326. [Google Scholar] [CrossRef]
- Afify, A.A.; Dimov, S.; Naim, M.M.; Valeva, V. Detecting cyclical disturbances in supply networks using data mining techniques. In Proceedings of the 2nd European Conference on Management of Technology, Birmingham, UK, 10–12 September 2006; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
- Jain, A.K.; Murty, M.N.; Flynn, P.J. Data clustering: A review. ACM Comput. Surv. 1999, 31, 264–323. [Google Scholar] [CrossRef]
- Naranjo, J.E.; Saha, R.; Tariq, M.T.; Hadi, M.; Xiao, Y. Pattern Recognition Using Clustering Analysis to Support Transportation System Management, Operations, and Modeling. J. Adv. Transp. 2019. [Google Scholar] [CrossRef]
- Kadir, R.A.; Shima, Y.; Sulaiman, R.; Ali, F. Clustering of public transport operation using K-means. In Proceedings of the 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), Shanghai, China, 9–12 March 2018; pp. 427–532. [Google Scholar]
- Sesham, A.; Padmanabham, P.; Govardhan, A. Application of Factor Analysis to k-means Clustering Algorithm on Transportation Data. IJCA 2014, 95, 40–46. [Google Scholar] [CrossRef]
- Deb Nath, R.P.; Lee, H.J.; Chowdhury, N.K.; Chang, J.W. Modified K-Means Clustering for Travel Time Prediction Based on Historical Traffic Data. LNCS 2010, 6276, 511–521. [Google Scholar] [CrossRef]
- Montazeri-Gh, M.; Fotouhi, A. Traffic condition recognition using the k-means clustering method. Sci. Iran. 2011, 18, 930–937. [Google Scholar] [CrossRef] [Green Version]
- Farahani, R.Z.; Hekmatfar, M. Facility Location Concepts, Models, Algorithms and Case Studies; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
- Drezner, Z.; Hamacher, H. Facility Location: Applications and Theory; Springer: Berlin, Germany, 2004; pp. 119–143. [Google Scholar]
- Klastorin, T.D. The p-Median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach. Manag. Sci. 1985, 31, 84–95. [Google Scholar] [CrossRef]
- Brusco, M.J.; Kohn, H.F. Optimal Partitioning of a Data Set Based on the p-Median Model. Psychometrica 2008, 73, 89–105. [Google Scholar] [CrossRef]
- Kaufman, L.; Rousseeuw, P.J. Clustering by means of Medoids. In Statistical Data Analysis Based on the L1–Norm and Related Methods; Dodge, Y., Ed.; Birkhäuser Basel: Basel, Switzerland, 1987; pp. 405–416. [Google Scholar]
- Schubert, E.; Rousseeuw, P. Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms. arXiv 2019, arXiv:1810.05691. [Google Scholar]
- Park, H.S.; Jun, C.H. A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 2009, 36, 3336–3341. [Google Scholar] [CrossRef]
- Hakimi, S.L. Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph. Oper. Res. 1964, 12, 450–459. [Google Scholar] [CrossRef]
- Masuyama, S.; Ibaraki, T.; Hasegawa, T. The Computational Complexity of the m-Center Problems on the Plane. Trans. Inst. Electron. Commun. Eng. Japan 1981, 64E, 57–64. [Google Scholar]
- Kariv, O.; Hakimi, S.L. An Algorithmic Approach to Network Location Problems. II: The P medians. SIAM J. Appl. Math. 1979, 37, 539–560. [Google Scholar] [CrossRef]
- Kuenne, R.E.; Soland, R.M. Exact and approximate solutions to the multisource Weber problem. Math. Program. 1972, 3, 193–209. [Google Scholar] [CrossRef]
- Ostresh, L.M., Jr. The Stepwise LocationAllocation Problem: Exact Solutions in Continuous and Discrete Spaces. Geogr. Anal. 1978, 10, 174–185. [Google Scholar] [CrossRef]
- Rosing, K.E. An optimal method for solving the (generalized) multi-Weber problem. Eur. J. Oper. Res. 1992, 58, 414–426. [Google Scholar] [CrossRef]
- Blum, C.; Roli, A. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Comput. Surv. 2001, 35, 268–308. [Google Scholar] [CrossRef]
- Neema, M.N.; Maniruzzaman, K.M.; Ohgai, A. New Genetic Algorithms Based Approaches to Continuous p-Median Problem. Netw. Spat. Econ. 2011, 11, 83–99. [Google Scholar] [CrossRef]
- Hoos, H.H.; Stutzle, T. Stochastic Local Search Foundations and Applications; Springer: Berlin, Germany, 2005. [Google Scholar]
- Bang-Jensen, J.; Chiarandini, M.; Goegebeur, Y.; Jorgensen, B. Mixed Models for the Analysis of Local Search Components. In Proceedings of the Engineering Stochastic Local Search Algorithms International Workshop, Brussels, Belgium, 6–8 September 2007; pp. 91–105. [Google Scholar]
- Cohen-Addad, V.; Mathieu, C. Effectiveness of local search for geometric optimization. In Proceedings of the 31st International Symposium on Computational Geometry, SoCG-2015, Eindhoven, The Netherlands, 22–25 June 2015; pp. 329–343. [Google Scholar]
- Kochetov, Y.; Mladenović, N.; Hansen, P. Local search with alternating neighborhoods. Discret. Anal. Oper. Res. 2003, 2, 11–43. (In Russian) [Google Scholar]
- Kanungo, T.; Mount, D.M.; Netanyahu, N.S.; Piatko, C.D.; Silverman, R.; Wu, A.Y. A local search approximation algorithm for k-means clustering. Comput. Geom. Theory Appl. 2004, 28, 89–112. [Google Scholar] [CrossRef]
- Page, E.S. On Monte Carlo methods in congestion problems. I: Searching for an optimum in discrete situations. Oper. Res. 1965, 13, 291–299. [Google Scholar] [CrossRef]
- Hromkovic, J. Algorithmics for Hard Problems: Introduction to Combinatorial Optimization, Randomization, Approximation, and Heuristics; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Ng, T. Expanding Neighborhood Tabu Search for facility location problems in water infrastructure planning. In Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), San Diego, CA, USA, 5–8 October 2014; pp. 3851–3854. [Google Scholar] [CrossRef]
- Mladenovic, N.; Brimberg, J.; Hansen, P.; Moreno-Perez, J.A. The p-median problem: A survey of metaheuristic approaches. Eur. J. Oper. Res. 2007, 179, 927–939. [Google Scholar] [CrossRef] [Green Version]
- Reese, J. Solution methods for the p-median problem: An annotated bibliography. Networks 2006, 48, 125–142. [Google Scholar] [CrossRef]
- Brimberg, J.; Drezner, Z.; Mladenovic, N.; Salhi, S. A New Local Search for Continuous Location Problems. Eur. J. Oper. Res. 2014, 232, 256–265. [Google Scholar] [CrossRef] [Green Version]
- Drezner, Z.; Brimberg, J.; Mladenovic, N.; Salhi, S. New heuristic algorithms for solving the planar p-median problem. Comput. Oper. Res. 2015, 62, 296–304. [Google Scholar] [CrossRef]
- Drezner, Z.; Brimberg, J.; Mladenovic, N.; Salhi, S. Solving the planar p-median problem by variable neighborhood and concentric searches. J. Glob. Optim. 2015, 63, 501–514. [Google Scholar] [CrossRef] [Green Version]
- Arthur, D.; Vassilvitskii, S. k-Means++: The Advantages of Careful Seeding. In Proceedings of the SODA’07, SIAM, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
- Bradley, P.S.; Fayyad, U.M. Refining Initial Points for K-Means Clustering. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML ‘98), Madison, WI, USA, 24–27 July 1998; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1998; pp. 91–99. [Google Scholar]
- Bhusare, B.B.; Bansode, S.M. Centroids Initialization for K-Means Clustering using Improved Pillar Algorithm. Int. J. Adv. Res. Comput. Eng. Technol. 2014, 3, 1317–1322. [Google Scholar]
- Yang, J.; Wang, J. Tag clustering algorithm lmmsk: Improved k-means algorithm based on latent semantic analysis. J. Syst. Electron. 2017, 28, 374–384. [Google Scholar]
- Mishra, N.; Oblinger, D.; Pitt, L. Sublinear time approximate clustering. In Proceedings of the 12th SODA, Washington, DC, USA, 7–9 January 2001; pp. 439–447. [Google Scholar]
- Eisenbrand, F.; Grandoni, F.; Rothvosz, T.; Schafer, G. Approximating connected facility location problems via random facility sampling and core detouring. In Proceedings of the SODA’2008, San Francisco, CA, USA, 20–22 January 2008; ACM: New York, NY, USA, 2008; pp. 1174–1183. [Google Scholar] [CrossRef]
- Jaiswal, R.A.; Kumar, A.; Sen, S. Simple D2-Sampling Based PTAS for k-Means and Other Clustering Problems. Algorithmica 2014, 70, 22–46. [Google Scholar] [CrossRef] [Green Version]
- Avella, P.; Boccia, M.; Salerno, S.; Vasilyev, I. An Aggregation Heuristic for Large Scale p-median Problem. Comput. Oper. Res. 2012, 39, 1625–1632. [Google Scholar] [CrossRef]
- Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; Wiley: New York, NY, USA, 1990. [Google Scholar]
- Francis, R.L.; Lowe, T.J.; Rayco, M.B.; Tamir, A. Aggregation error for location models: Survey and analysis. Ann. Oper. Res. 2009, 167, 171–208. [Google Scholar] [CrossRef]
- Pelleg, D.; Moore, A. Accelerating Exact k-Means with Geometric Reasoning [Technical Report CMU-CS-00-105]; Carnegie Melon University: Pittsburgh, PA, USA, 2000. [Google Scholar]
- Borgelt, C. Even Faster Exact k-Means Clustering. LNCS 2020, 12080, 93–105. [Google Scholar] [CrossRef] [Green Version]
- Lai, J.Z.C.; Huang, T.J.; Liaw, Y.C. A Fast k-Means Clustering Algorithm Using Cluster Center Displacement. Pattern Recognit. 2009, 42, 2551–2556. [Google Scholar] [CrossRef]
- Mladenovic, N.; Hansen, P. Variable Neighborhood Search. Comput. Oper. Res. 1997, 24, 1097–1100. [Google Scholar] [CrossRef]
- Hansen, P. Variable Neighborhood Search. Search Methodology. In Search Metodologies; Bruke, E.K., Kendall, G., Eds.; Springer: New York, NY, USA, 2005; pp. 211–238. [Google Scholar] [CrossRef] [Green Version]
- Hansen, P.; Mladenovic, N. Variable Neighborhood Search. In Handbook of Heuristics; Martí, R., Pardalos, P., Resende, M., Eds.; Springer: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
- Brimberg, J.; Hansen, P.; Mladenovic, N. Attraction Probabilities in Variable Neighborhood Search. 4OR-Q. J. Oper. Res 2010, 8, 181–194. [Google Scholar] [CrossRef]
- Hansen, P.; Mladenovic, N.; Perez, J.A.M. Variable Neighborhood Search: Methods and Applications. 4OR-Q. J. Oper. Res. 2008, 6, 319–360. [Google Scholar] [CrossRef]
- Hansen, P.; Brimberg, J.; Urosevic, D.; Mladenovic, N. Solving Large p-Median Clustering Problems by Primal Dual Variable Neighborhood Search. Data Min. Knowl. Discov. 2009, 19, 351–375. [Google Scholar] [CrossRef]
- Rozhnov, I.P.; Orlov, V.I.; Kazakovtsev, L.A. VNS-Based Algorithms for the Centroid-Based Clustering Problem. Facta Univ. Ser. Math. Inform. 2019, 34, 957–972. [Google Scholar]
- Hansen, P.; Mladenovic, N. J-Means: A new local search heuristic for minimum sum-of-squares clustering. Pattern Recognit. 2001, 34, 405–413. [Google Scholar] [CrossRef]
- Martins, P. Goal Clustering: VNS Based Heuristics. Available online: https://arxiv.org/abs/1705.07666v4 (accessed on 24 October 2020).
- Carrizosa, E.; Mladenovic, N.; Todosijevic, R. Variable neighborhood search for minimum sum-of-squares clustering on networks. Eur. J. Oper. Res. 2013, 230, 356–363. [Google Scholar] [CrossRef]
- Roux, M. A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms. J. Classif. 2018, 35, 345–366. [Google Scholar] [CrossRef] [Green Version]
- Sharma, A.; López, Y.; Tsunoda, T. Divisive hierarchical maximum likelihood clustering. BMC Bioinform. 2017, 18, 546. [Google Scholar] [CrossRef]
- Venkat Reddy, M.; Vivekananda, M.; Satish, R.U.V.N. Divisive Hierarchical Clustering with K-means and Agglomerative Hierarchical Clustering. IJCST 2017, 5, 6–11. [Google Scholar]
- Sun, Z.; Fox, G.; Gu, W.; Li, Z. A parallel clustering method combined information bottleneck theory and centroid-based clustering. J. Supercomput. 2014, 69, 452–467. [Google Scholar] [CrossRef]
- Kuehn, A.A.; Hamburger, M.J. A heuristic program for locating warehouses. Manag. Sci. 1963, 9, 643–666. [Google Scholar] [CrossRef]
- Alp, O.; Erkut, E.; Drezner, Z. An Efficient Genetic Algorithm for the p-Median Problem. Ann. Oper. Res. 2003, 122, 21–42. [Google Scholar] [CrossRef]
- Cheng, J.; Chen, X.; Yang, H.; Leng, M. An enhanced k-means algorithm using agglomerative hierarchical clustering strategy. In Proceedings of the International Conference on Automatic Control and Artificial Intelligence (ACAI 2012), Xiamen, China, 3–5 March 2012; pp. 407–410. [Google Scholar] [CrossRef]
- Kazakovtsev, L.A.; Antamoshkin, A.N. Genetic Algorithm with Fast Greedy Heuristic for Clustering and Location Problems. Informatica 2014, 3, 229–240. [Google Scholar]
- Pelleg, D.; Moore, A. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In Proceedings of the International Conference on Machine Learning ICML, Sydney, Australia, 8–12 July 2002. [Google Scholar]
- Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
- Frackiewicz, M.; Mandrella, A.; Palus, H. Fast Color Quantization by K-Means Clustering Combined with Image Sampling. Symmetry 2019, 11, 963. [Google Scholar] [CrossRef] [Green Version]
- Zhang, G.; Li, Y.; Deng, X. K-Means Clustering-Based Electrical Equipment Identification for Smart Building Application. Information 2020, 11, 27. [Google Scholar] [CrossRef] [Green Version]
- Chen, F.; Yang, Y.; Xu, L.; Zhang, T.; Zhang, Y. Big-Data Clustering: K-Means or K-Indicators? 2019. Available online: https://arxiv.org/pdf/1906.00938.pdf (accessed on 18 October 2020).
- Qin, J.; Fu, W.; Gao, H.; Zheng, W.X. Distributed k-means algorithm and fuzzy c -means algorithm for sensor networks based on multiagent consensus theory. IEEE Trans. Cybern. 2016, 47, 772–783. [Google Scholar] [CrossRef]
- Shindler, M.; Wong, A.; Meyerson, A. Fast and accurate k-means for large datasets. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11), Sydney, Australia, 13–16 December 2011; Curran Associates Inc.: Red Hook, NY, USA, 2011; pp. 2375–2383. [Google Scholar]
- Hedar, A.R.; Ibrahim, A.M.M.; Abdel-Hakim, A.E.; Sewisy, A.A. K-Means Cloning: Adaptive Spherical K-Means Clustering. Algorithms 2018, 11, 151. [Google Scholar] [CrossRef] [Green Version]
- Xu, T.S.; Chiang, H.D.; Liu, G.Y.; Tan, C.W. Hierarchical k-means method for clustering large-scale advanced metering infrastructure data. IEEE Trans. Power Deliv. 2015, 32, 609–616. [Google Scholar] [CrossRef]
- Wang, X.D.; Chen, R.C.; Yan, F.; Zeng, Z.Q.; Hong, C.Q. Fast adaptive k-means subspace clustering for high-dimensional data. IEEE Access 2019, 7, 639–651. [Google Scholar] [CrossRef]
- Zechner, M.; Granitzer, M. Accelerating K-Means on the Graphics Processor via CUDA. In Proceedings of the International Conference on Intensive Applications and Services, Valencia, Spain, 20–25 April 2009; pp. 7–15. [Google Scholar] [CrossRef] [Green Version]
- Luebke, D.; Humphreys, G. How GPUs work. Computer 2007, 40, 96–110. [Google Scholar] [CrossRef]
- Maulik, U.; Bandyopadhyay, S. Genetic Algorithm-Based Clustering Technique. Pattern Recognit. 2000, 33, 1455–1465. [Google Scholar] [CrossRef]
- Krishna, K.; Murty, M. Genetic K-Means algorithm. IEEE Trans. Syst. Man Cybern. Part B 1999, 29, 433–439. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Singh, N.; Singh, D.P.; Pant, B. ACOCA: Ant Colony Optimization Based Clustering Algorithm for Big Data Preprocessing. Int. J. Math. Eng. Manag. Sci. 2019, 4, 1239–1250. [Google Scholar] [CrossRef]
- Merwe, D.W.; Engelbrecht, A.P. Data Clustering Using Particle Swarm Optimization. In Proceedings of the 2003 Congress on Evolutionary Computation, Canberra, Australia, 8–12 December 2003; pp. 215–220. [Google Scholar]
- Nikolaev, A.; Mladenovic, N.; Todosijevic, R. J-means and I-means for minimum sum-of-squares clustering on networks. Optim. Lett. 2017, 11, 359–376. [Google Scholar] [CrossRef]
- Fränti, P.; Sieranoja, S. K-means properties on six clustering benchmark datasets. Appl. Intell. 2018, 48, 4743–4759. [Google Scholar] [CrossRef]
- Clustering Basic Benchmark. Available online: http://cs.joensuu.fi/sipu/datasets/ (accessed on 15 September 2020).
- Kazakovtsev, L.; Shkaberina, G.; Rozhnov, I.; Li, R.; Kazakovtsev, V. Genetic Algorithms with the Crossover-Like Mutation Operator for the k-Means Problem. CCIS 2020, 1275, 350–362. [Google Scholar] [CrossRef]
- Brimberg, J.; Mladenovic, N. A variable neighborhood algorithm for solving the continuous location-allocation problem. Stud. Locat. Anal. 1996, 10, 1–12. [Google Scholar]
- Miskovic, S.; Stanimirovich, Z.; Grujicic, I. An efficient variable neighborhood search for solving a robust dynamic facility location problem in emergency service network. Electron. Notes Discret. Math. 2015, 47, 261–268. [Google Scholar] [CrossRef]
- Crainic, T.G.; Gendreau, M.; Hansen, P.; Hoeb, N.; Mladenovic, N. Parallel variable neighbourhood search for the p-median. In Proceedings of the 4th Metaheuristics International conference MIC’2001, Porto, Portugal, 16–21 July 2001; pp. 595–599. [Google Scholar]
- Hansen, P.; Mladenovic, N. Variable neighborhood search for the p-median. Locat. Sci. 1997, 5, 207–226. [Google Scholar] [CrossRef]
- Wen, M.; Krapper, E.; Larsen, J.; Stidsen, T.K. A multilevel variable neighborhood search heuristic for a practical vehicle routing and driver scheduling problem. Networks 2011, 58, 311–323. [Google Scholar] [CrossRef] [Green Version]
- Baldassi, C. Recombinator-k-Means: Enhancing k-Means++ by Seeding from Pools of Previous Runs. Available online: https://arxiv.org/abs/1905.00531v1 (accessed on 18 September 2020).
- Duarte, A.; Mladenović, N.; Sánchez-Oro, J.; Todosijević, R. Variable Neighborhood Descent. In Handbook of Heuristics; Martí, R., Panos, P., Resende, M., Eds.; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
- Dua, D.; Graff, C. UCI Machine Learning Repository 2019. Available online: http://archive.ics.uci.edu/ml (accessed on 30 September 2020).
- Molla, M.M.; Nag, P.; Thohura, S.; Khan, A. A Graphics Process Unit-Based Multiple-Relaxation-Time Lattice Boltzmann Simulation of Non-Newtonian Fluid Flows in a Backward Facing Step. Computation 2020, 8, 83. [Google Scholar] [CrossRef]
- Kazakovtsev, L.A.; Rozhnov, I.P.; Popov, E.A.; Karaseva, M.V.; Stupina, A.A. Parallel implementation of the greedy heuristic clustering algorithms. IOP Conf. Ser. Mater. Sci. Eng. 2019, 537, 022052. [Google Scholar] [CrossRef]
- Zhang, T.; Ramakrishnan, R.; Livny, M. BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of data (SIGMOD’96), Montreal, QC, Canada, 4–6 June 1996; ACM: New York, NY, USA, 1996; pp. 103–114. [Google Scholar] [CrossRef]
- Smucker, M.D.; Allan, J.; Carterette, B.A. Comparison of Statistical Significance Tests for Information Retrieval. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM ‘07), Lisbon, Portugal, 6–10 November 2007; ACM: New York, NY, USA, 2007; pp. 623–632. [Google Scholar]
- Park, H.M. Comparing Group Means: The t-Test and One-way ANOVA Using STATA, SAS, and SPSS; Indiana University: Bloomington, Indiana, 2009. [Google Scholar]
- Mann, H.B.; Whitney, D.R. On a Test of Whether one of Two Random Variables is Stochastically Larger than the other. Ann. Math. Stat. 1947, 18, 50–60. [Google Scholar] [CrossRef]
- Fay, M.P.; Proschan, M.A. Wilcoxon-Mann-Whitney or t-Test? On Assumptions for Hypothesis Tests and Multiple Interpretations of Decision Rules. Stat. Surv. 2010, 4, 1–39. [Google Scholar] [CrossRef]
- Burke, E.; Gendreau, M.; Hyde, M.; Kendall, G.; Ochoa, G.; Ozkan, E.; Qu, R. Hyper-heuristics: A survey of the state of the art. J. Oper. Res. Soc. 2013, 64, 1695–1724. [Google Scholar] [CrossRef] [Green Version]
- Stanovov, V.; Semenkin, E.; Semenkina, O. Self-configuring hybrid evolutionary algorithm for fuzzy imbalanced classification with adaptive instance selection. J. Artif. Intell. Soft Comput. Res. 2016, 6, 173–188. [Google Scholar] [CrossRef] [Green Version]
- Semenkina, M.; Semenkin, E. Hybrid Self-configuring Evolutionary Algorithm for Automated Design of Fuzzy Classifier. LNCS 2014, 8794, 310–317. [Google Scholar] [CrossRef]
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | p-Values and Statistical Significance of Difference in Results | ||||
---|---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | ||
BIRCH3 data set. 105 data vectors in ℝ2, k = 300 clusters, time limitation 10 s | ||||||
GREEDY200 | 1.30773 × 1013 | 1.31172 × 1013 | 1.30916 × 1013 | 1.30912 × 1013 | 1.08001 × 1010 | pt = 0.4098↔ |
AdaptiveGreedy | 1.30807 × 1013 | 1.31113 × 1013 | 1.30922 × 1013 | 1.30925 × 1013 | 0.87731 × 1010 | pU = 0.2337⇔ |
BIRCH3 data set. 105 data vectors in ℝ2, k = 100 clusters, time limitation 10 s | ||||||
GREEDY5 | 3.71485 × 1013 | 3.72087 × 1013 | 3.71644 × 1013 | 3.71518 × 1013 | 2.22600 × 1010 | pt = 0.0701↔ |
AdaptiveGreedy | 3.71484 × 1013 | 3.72011 × 1013 | 3.71726 × 1013 | 3.71749 × 1013 | 2.02784 × 1010 | pU = 0.1357⇔ |
Mopsi-Joensuu data set. 6014 data vectors in ℝ2, k = 300 clusters, time limitation 5 s | ||||||
GH-VNS3 | 0.4321 | 0.6838 | 0.6024 | 0.6139 | 0.0836 | pU = 0.00005⇑ |
GREEDY200 | 0.4555 | 1.0154 | 0.6746 | 0.5882 | 0.2163 | pt < 0.00001↑ |
AdaptiveGreedy | 0.3128 | 0.6352 | 0.4672 | 0.4604 | 0.1026 | |
Mopsi-Joensuu data set. 6014 data vectors in ℝ2, k = 100 clusters, time limitation 5 s | ||||||
GREEDY100 | 1.8021 | 2.2942 | 2.0158 | 1.9849 | 0.1860 | pt = 0.0910↔ |
GH-VNS3 | 1.7643 | 2.7357 | 2.0513 | 1.9822 | 0.2699 | pU = 0.0042⇑ |
AdaptiveGreedy | 1.7759 | 2.3265 | 1.9578 | 1.9229 | 0.1523 | |
Mopsi-Joensuu data set. 6014 data vectors in ℝ2, k = 30 clusters, time limitation 5 s | ||||||
GH-VNS1 | 18.3147 | 18.3255 | 18.3238 | 18.3253 | 0.0039 | pt = 0.4118↔ |
AdaptiveGreedy | 18.3146 | 18.3258 | 18.3240 | 18.3253 | 0.0037 | pU = 0.2843⇔ |
Mopsi- Finland data set.13,467 data vectors in ℝ2, k = 300 clusters, time limitation 5 s | ||||||
GH-VNS3 | 5.33373 × 108 | 7.29800 × 108 | 5.74914 × 108 | 5.48427 × 108 | 5.05346 × 107 | pt = 0.1392↔ |
AdaptiveGreedy | 5.27254 × 108 | 7.09410 × 108 | 5.60867 × 108 | 5.38952 × 108 | 4.89257 × 107 | pU = 0.0049⇑ |
Mopsi-Finland data set. 13,467 data vectors in ℝ2, k = 30 clusters, time limitation 5 s | ||||||
GH-VNS3 | 3.42528 × 1010 | 3.47955 × 1010 | 3.43826 × 1010 | 3.43474 × 1010 | 1.02356 × 108 | pt = 0.0520↔ |
AdaptiveGreedy | 3.42528 × 1010 | 3.47353 × 1010 | 3.43385 × 1010 | 3.43473 × 1010 | 1.03984 × 108 | pU = 0.0001⇑ |
S1 data set. 5000 data vectors in ℝ2, k = 15 clusters, time limitation 1 second | ||||||
GH-VNS2 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 0.0000 | pt = 0.5↔ |
AdaptiveGreedy | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 8.91703 × 1012 | 0.0000 | pU = 0.5⇔ |
S1 data set. 5000 data vectors in ℝ2, k = 50 clusters, time limitation 1 second | ||||||
GH-VNS1 | 3.74310 × 1012 | 3.76674 × 1012 | 3.74911 × 1012 | 3.74580 × 1012 | 6.99859 × 109 | pt = 0.3571↔ |
AdaptiveGreedy | 3.74340 × 1012 | 3.76313 × 1012 | 3.74851 × 1012 | 3.75037 × 1012 | 5.56298 × 109 | pU = 0.28434⇔ |
IHEPC data set. 2,075,259 data vectors in ℝ7, k = 50 clusters, time limitation 5 min | ||||||
GREEDY10 | 5154.2017 | 5176.4502 | 5162.0460 | 5160.4014 | 7.2029 | pt = 0.008↑ |
AdaptiveGreedy | 5153.5640 | 5163.9316 | 5157.0822 | 5155.5198 | 3.6034 | pU = 0.001⇑ |
Algorithm or Neighborhood | Achieved SSE Summarized After 30 Runs | ||||
---|---|---|---|---|---|
Min (Record) | Max (Worst) | Average | Median | Std.dev | |
k = 300 | |||||
GH-VNS3 | 5.33373 × 108 | 7.29800 × 108 | 5.85377 × 108 | 5.52320 × 108 | 5.59987 × 107 |
AdaptiveGreedy | 5.27254 × 108 | 7.09410 × 108 | 5.59033 × 108 | 5.38888 × 108 | 4.60585 × 107 |
k = 30 | |||||
GH-VNS2 | 3.42528 × 1010 | 3.48723 × 1010 | 3.43916 × 1010 | 3.43474 × 1010 | 1.46818 × 108 |
GH-VNS3 | 3.42528 × 1010 | 3.46408 × 1010 | 3.43731 × 1010 | 3.43474 × 1010 | 7.81989 × 107 |
AdaptiveGreedy | 3.42528 × 1010 | 3.46274 × 1010 | 3.43337 × 1010 | 3.43473 × 1010 | 8.13882 × 107 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kazakovtsev, L.; Rozhnov, I.; Popov, A.; Tovbis, E. Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering. Computation 2020, 8, 90. https://doi.org/10.3390/computation8040090
Kazakovtsev L, Rozhnov I, Popov A, Tovbis E. Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering. Computation. 2020; 8(4):90. https://doi.org/10.3390/computation8040090
Chicago/Turabian StyleKazakovtsev, Lev, Ivan Rozhnov, Aleksey Popov, and Elena Tovbis. 2020. "Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering" Computation 8, no. 4: 90. https://doi.org/10.3390/computation8040090
APA StyleKazakovtsev, L., Rozhnov, I., Popov, A., & Tovbis, E. (2020). Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering. Computation, 8(4), 90. https://doi.org/10.3390/computation8040090