Next Article in Journal
All-Nitrogen Cages and Molecular Crystals: Topological Rules, Stability, and Pyrolysis Paths
Previous Article in Journal
Modeling of Isocyanate Synthesis by the Thermal Decomposition of Carbamates
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering

Reshetnev Siberian State University of Science and Technology, Institute of Informatics and Telecommunications, Krasnoyarskiy Rabochiy av. 31, 660037 Krasnoyarsk, Russia
*
Author to whom correspondence should be addressed.
Computation 2020, 8(4), 90; https://doi.org/10.3390/computation8040090
Submission received: 9 October 2020 / Revised: 30 October 2020 / Accepted: 2 November 2020 / Published: 5 November 2020
(This article belongs to the Section Computational Engineering)

Abstract

:
The k-means problem is one of the most popular models in cluster analysis that minimizes the sum of the squared distances from clustered objects to the sought cluster centers (centroids). The simplicity of its algorithmic implementation encourages researchers to apply it in a variety of engineering and scientific branches. Nevertheless, the problem is proven to be NP-hard which makes exact algorithms inapplicable for large scale problems, and the simplest and most popular algorithms result in very poor values of the squared distances sum. If a problem must be solved within a limited time with the maximum accuracy, which would be difficult to improve using known methods without increasing computational costs, the variable neighborhood search (VNS) algorithms, which search in randomized neighborhoods formed by the application of greedy agglomerative procedures, are competitive. In this article, we investigate the influence of the most important parameter of such neighborhoods on the computational efficiency and propose a new VNS-based algorithm (solver), implemented on the graphics processing unit (GPU), which adjusts this parameter. Benchmarking on data sets composed of up to millions of objects demonstrates the advantage of the new algorithm in comparison with known local search algorithms, within a fixed time, allowing for online computation.

1. Introduction

1.1. Problem Statement

The aim of a clustering problem solving is to divide a given set (sample) of objects (data vectors) into disjoint subsets, called clusters, so that each cluster consists of similar objects, and the objects of different clusters have significant dissimilarities [1,2]. The clustering problem belongs to a wide class of unsupervised machine learning problems. Clustering models involve various similarity or dissimilarity measures. The k-means model with the squared Euclidean distance as a dissimilarity measure is based exclusively on the maximum similarity (minimum sum of squared distances) among objects within clusters.
Clustering methods can be divided into two main categories: hierarchical and partitioning [1,3]. Partitioning clustering, such as k-means, aims at optimizing the clustering result in accordance with a pre-defined objective function [3].
The k-means problem [4,5], also known as minimum sum-of-squares clustering (MSSC), assumes that the objects being clustered are described by numerical features. Each object is represented by a point in the feature space d (data vector). It is required to find a given number k of cluster centers (called centroids), such as to minimize the sum of the squared distances from the data vectors to the nearest centroid.
Let A1, …, AN d be data vectors, N be the number of them, and S = {X1, …, Xk} ⊂ d be the set of sought centroids. The objective function (sum of squared errors, SSE) of the k-means optimization problem formulated by MacQueen [5] is:
S S E ( X 1 , , X k ) = S S E ( S ) = i = 1 N m i n X { X 1 , , X k } A i X 2 m i n X 1 , , X k d .
Here, is the Euclidean distance, integer k must be known in advance.
A cluster in the k-means problem is a subset of data vectors for which the specified centroid is the nearest one:
C j = { A i , i = 1 , N ¯ | A i X j = m i n X { X 1 , , X k } A i X } ,       j = 1 , k ¯ .
We assume that a data vector cannot belong to two clusters at the same time. At an equal distance for several centroids, the question of assignment to a cluster can be solved by clustering algorithms in different ways. For example, a data vector belongs to a cluster lower in number:
C j = { A i , i = 1 , N ¯ |   j = 1 , k ¯ : A i X j < A i X j   or   ( A i X j = A i X j   and   j > j ) } , j = 1 , k ¯ .
Usually, for practical problems with sufficiently accurate measured values of data vectors, the assignment to a specific cluster is not very important.
The objective function may also be formulated as follows:
S S E ( X 1 , , X k ) = j = 1 k i = 1 , N ¯ : A i C j A i X j 2 m i n X 1 , , X k d , .
or
S S E ( C 1 , , C k ) = j = 1 k i = 1 , N ¯ : A i C j A i X j 2 m i n C 1 , , C k { A 1 , , A N } .
Equations (3) and (4) correspond to continuous and discrete statements of our problem, respectively.
Such clustering problem statements have a number of drawbacks. In particular, the number of clusters k must be given in advance, which is hardly possible for the majority of practically important problems. Furthermore, the adequacy of the result in the case of a complex cluster shapes is questionable (this model is proved to work fine with the ball-shaped clusters [6]). The result is sensitive to the outliers (standalone objects) [7,8] and depends on the chosen distance measure and the data normalization method. This model does not take into account the dissimilarity between the objects in different clusters, and the application of the k-means model results in some solution X1, …, Xk even in the cases with no cluster structure in the data [9,10]. Moreover, the NP-hardness [11,12] of the problem makes the exact methods [6] applicable only for very small problems.
Nevertheless, the simplicity of the most commonly used algorithmic realization as well as the interpretability of the results make the k-means problem the most popular clustering model. Developers’ efforts are focused on the design of heuristic algorithms that provide acceptable and attainable values of the objective function.

1.2. State of the Art

The most commonly used algorithm for solving problem (1) is the Lloyd’s procedure proposed in 1957 and published in 1982 [4], also known as the k-means algorithm, or alternate location-allocation (ALA) algorithm [13,14]. This algorithm consists of two simple alternating steps, the first of which solves the simplest continuous (quadratic) optimization problem (3), finding the optimal positions of the centroids X1, …, Xk for a fixed composition of clusters. The second step solves the simplest combinatorial optimization problem (4) by redistributing data vectors between clusters at fixed positions of the centroids. Both steps aim at minimizing the SSE. Despite the theoretical estimation of the computational complexity being quite high [15,16,17], in practice, the algorithm quickly converges to a local minimum. The algorithm starts with some initial solution S = {X1, …, Xk}, for instance, chosen at random, and its result is highly dependent on this choice. In the case of large-scale problems, this simple algorithm is incapable of obtaining the most accurate solutions.
Various clustering models are widely used in many engineering applications [18,19], such as energy loss detection [20], image segmentation [21], production planning [22], classification of products such as semiconductor devices [23], recognition of turbulent flow patterns [24], and cyclical disturbance detection in supply networks [25]. Clustering is also used as a preprocessing step for the supervised classification [26].
In [27], Naranjo et al. use various clustering approaches including the k-means model for automatic classification of traffic incidents. The approach proposed in [28] uses the k-means clustering model for the optimal scheduling of public transportation. Sesham et al. [29] use factor analysis methods in a combination with the k-means clustering for detecting cluster structures in transportation data obtained from the interview survey. Such data include the geographic information (home addresses) and general route information. The use of GPS sensors [30] for collecting traffic data provides us with large data arrays for such problems as the travel time prediction, traffic condition recognition [31], etc.
The k-means problem can be classified as a continuous location problem [32,33]: it is aimed at finding the optimal location of centroids in a continuous space.
If we replace squared distances with distances in (1), we deal with the very similar continuous k-median location problem [34] which is also a popular clustering model [35]. The k-medoids [36,37] problem is its discrete version where cluster centers must be selected among the data vectors only, which allows us to calculate the distance matrix in advance [38]. The k-median problem was also formulated as a discrete location problem [39] on a graph. The similarity of these NP-hard problems [40,41] enables us to use similar approaches to solve them. In the early attempts to solve the k-median problem (its discrete version) by exact methods, researchers used a branch and bound algorithm [42,43,44] for solving very small problems.
Metaheuristic approaches, such as genetic algorithms [45], are aimed at finding the global optimum. However, in large-scale instances, such approaches require very significant computational costs, especially if they are adapted to solving continuous problems [46].
With regard to discrete optimization problems, local search methods, which include Lloyd’s procedure, have been developed since the 1950s [47,48,49,50]. These methods have been successfully used to solve location problems [51,52]. The progress of local search methods is associated with both new algorithmic schemes and new theoretical results in the field of local search [50].
A standard local search algorithm starts with some initial solution S and goes to a neighboring solution if this solution turns out to be superior. Moreover, finding the set of neighbor solutions n(S) is the key issue. Elements of this set are formed by applying a certain procedure to a solution S. At each local search step, the neighborhood function n(S) specifies the set of possible search directions. Neighborhood functions can be very diverse, and the neighborhood relation is not always symmetric [53,54].
For a review of heuristic solution techniques applied to k-means and k-median problems, the reader can refer to [32,55,56]. Brimberg, Drezner, and Mladenovic and Salhi [57,58,59] presented local search approaches including the variable neighborhood search (VNS) and concentric search. In [58], Drezner et al. proposed heuristic procedures including the genetic algorithm (GA), for rather small data sets. Algorithms for finding the initial solution for the Lloyd’s procedure [60,61] are aimed at improving the average resulting solution. For example, in [62], Bhusare et al. propose an approach to spread the initial centroids uniformly so that the distance among them is as far as possible. The most popular kmeans++ initialization method introduced by Arthur and Vassilvitskii [60] is a probabilistic implementation of the same idea. An approach proposed by Yang and Wang [63] improves the traditional k-means clustering algorithm by choosing initial centroids with a min-max similarity. Gu et al. [7] provide a density-based initial cluster center selection method to solve the problem of outliers. Such smart initialization algorithms reduce the search area for local search algorithms in multi-start modes. Nevertheless, they do not guarantee an optimal or near optimal solution of the problem (1).
Many authors propose approaches based on reducing the amount of data [64]: simplification of the problem by random (or deterministic) selection of a subset of the initial data set for a preliminary solution of the k-means problem, and using these results as an initial solution to the k-means algorithm on the complete data set [65,66,67]. Such aggregation approaches, summarized in [68], as well as reducing the number of the data vectors [69], enable us to solve large-scale problems within a reasonable time. However, such approaches lead to a further reduction in accuracy. Moreover, many authors [70,71] name their algorithm “exact” which does not mean the ability to achieve an exact solution of (1). In such algorithms, the word “exact” means the exact adherence to the scheme of the Lloyd’s procedure, without any aggregation, sampling, and relaxation approaches. Thus, such algorithms may be faster than the Lloyd’s procedure due to the use of triangle inequality, storing the results of distance calculations in multidimensional data sets or other tricks [72], however they are not intended to get the best value of (1). In our research, aimed at obtaining the most precise solutions, we consider only the methods which estimate the objective function (1) directly, without aggregation or approximation approaches.
The main idea of the variable neighborhood search algorithms proposed by Hansen and Mladenovic [73,74,75] is the alternation of neighborhood functions n(S). Such algorithms include Lloyd’s procedure, which alternates finding a locally optimal solution of a continuous optimization problem (3) with a solution of a combinatorial problem (4). However, as applied to the k-means problem, the VNS class traditionally involves more complex algorithms.
The VNS algorithms are used for a wide variety of problems [3,76,77] including clustering [78] and work well for solving k-means and similar problems [50,79,80,81,82].
Agglomerative and dissociative procedures are separate classes of clustering algorithms. Dissociative (divisive) procedures [83] are based on splitting clusters into smaller clusters. Such algorithms are commonly used for small problems due to their high computing complexity [83,84,85], most often in hierarchical clustering models. The agglomerative approach is the most popular in hierarchical clustering, however, it is also applied in other models of cluster analysis. Agglomerative procedures [86,87,88,89,90] combine clusters sequentially, i.e., in relation to the k-means problem, they sequentially remove centroids. The elements of the clusters, related to the removed centroids, are redistributed among the remaining clusters. The greedy strategies are used to decide which clusters are most similar to be merged together [3] at each iteration of the agglomerative procedure. An agglomerative procedure starts with some solution S containing an excessive number of centroids and clusters k + r, where integer r is known in advance or chosen randomly. The r value (number of excessive centroids in the temporary solution) is the most important parameter of the agglomerative procedure. Some algorithms, including those based on the k-means model [91], involve both the agglomerative and dissociative approaches. Moreover, such algorithms are not aimed at achieving the best value of the objective function (1), and their accuracy is not high in this sense.

1.3. Research Gap

Many transportation and other problems (e.g., clustering problems related to computer vision) require online computation within a fixed time. As mentioned above, Lloyd’s procedure, the most popular k-means clustering algorithm, is rather fast. Nevertheless, for specific data sets including geographic/geometrical data, this algorithm results in a solution which is very far from the global minimum of the objective function (1), and the multi-start operation mode does not improve the result significantly. More accurate k-means clustering methods are much slower. Nevertheless, recent advances in high-performance computing and the use of massively parallel systems enable us to work through a large amount of computation using the Lloyd’s procedure embedded into more complex algorithmic schemes. Thus, the demand for clustering algorithms that compromise on the time spent for computations and the resulting objective function (1) value is apparent. Nevertheless, in some cases, when solving problem (1), it is required to obtain a result (a value of the objective function) within a limited fixed time, which would be difficult to improve on by known methods without a significant increase in computational costs. Such results are required if the cost of error is high, as well as for evaluating faster algorithms, as reference solutions.
Agglomerative procedures, despite their relatively high computational complexity, can be successfully integrated into more complex search schemes. They can be used as a part of the crossover operator of genetic algorithms [46,88] and as a part of the VNS algorithms. Moreover, such algorithms are a compromise between the solution accuracy and time costs. In this article, by accuracy, we mean exclusively the ability of the algorithm (solver) to obtain the minimum values of the objective function (1).
The use of VNS algorithms, that search in the neighborhoods, formed by applying greedy agglomerative procedures to a known (current) solution S, enables us to obtain good results in a fixed time acceptable for interactive modes of operation. The selection of such procedures, their sequence and their parameters remained an open question. The efficiency of such procedures has been experimentally shown on some test and practical problems. Various versions of VNS algorithms based on greedy agglomerative procedures differ significantly in their results which makes such algorithm difficult to use in practical problems. It is practically impossible to forecast the relative performance of a specific VNS algorithm based on such generalized numerical features of the problem as the sample size and the number of clusters. Moreover, the efficiency of such procedures depends on their parameters. However, the type and nature of this dependence has not been studied.

1.4. Our Contribution

In this article, we systematize approaches to the construction of search algorithms in neighborhoods, formed by the use of greedy agglomerative procedures.
In this work, we proceeded from the following assumptions:
(a)
The choice of parameter r value (the number of excessive centroids, see above) of the greedy agglomerative heuristic procedure significantly affects the efficiency of the procedure.
(b)
Since it is hardly possible to determine the optimal value of this parameter based on such numerical parameters of the k-means problem as the number of data vectors and the number of clusters, reconnaissance (exploratory) search with various values of r can be useful.
(c)
Unlike the well-known VNS algorithms that use greedy agglomerative heuristic procedures with an increasing value of the parameter r, a gradual decrease in the value of this parameter may be more effective.
Based on these assumptions, we propose a new VNS algorithm involving greedy agglomerative procedures for the k-means problem, which, by adjusting the initial r parameter of such procedures, enables us to obtain better results in a fixed time which exceed the results of known VNS algorithms. Due to self-adjusting capabilities, such an algorithm should be more versatile, which should increase its applicability to a wider range of problems in comparison with known VNS algorithms based on greedy agglomerative procedures.

1.5. Structure of this Article

The rest of this article is organized as follows. In Section 2, we present an overview of the most common local search algorithms for k-means and similar problems, and introduce the notion of neighborhoods SWAPr and GREEDYr. It is shown experimentally that the search result in these neighborhoods strongly depends on the neighborhood parameter r (the number of simultaneously alternated or added centroids). In addition, we present a new VNS algorithm which performs the local search in alternating GREEDYr neighborhoods with the decreasing value of r and its initial value estimated by a special auxiliary procedure. In Section 3, we describe our computational experiments with the new and known algorithms. In Section 4, we consider the applicability of the results on the adjustment of the GREEDYr neighborhood parameter in algorithmic schemes other than VNS, in particular, in evolutionary algorithms with a greedy agglomerative crossover operator. The conclusions are given in Section 5.

2. Materials and Methods

For constructing a more efficient algorithm (solver), we used a combination of such algorithms as Lloyd’s procedure, greedy agglomerative clustering procedures, and the variable neighborhood search. The most computationally expensive part of this new algorithmic construction, Lloyd’s procedure, was implemented on graphic processing units (GPU).

2.1. The Simplest Approach

Lloyd’s procedure, the simplest and most popular algorithm for solving the k-means problem, is described as follows (see Algorithm 1).
Algorithm 1.Lloyd(S)
Require: Set of initial centroids S = {X1, …, Xk}. If S is not given, then the initial centroids are selected randomly from the set of data vectors {A1, …, AN}.
repeat
1. For each centroid Xj, j = 1 , k ¯ , define its cluster in accordance with (2); // I.e. assign each data vector to the nearest centroid
2. For each cluster Cj, j = 1 , k ¯ , calculate its centroid as follows:
X j = i { 1 , N ¯ } : A i C j A i | C j | .
until all centroids stay unchanged.
Formally, the k-means problem in its formulation (1) or (3) is a continuous optimization problem. With a fixed composition of clusters Cj, the optimal solution is found in an elementary way, see Step 2 in Algorithm 1, and this solution is the local optimum of the problem in terms of the continuous optimization theory, i.e., local optimum in the ε -neighborhood. A large number of such optima forces the algorithm designers to systematize their search in some way. The first step of Lloyd’s algorithm solves a simple combinatorial optimization problem (3) on the redistribution of data vectors among clusters, that is, it searches in the other neighborhood.
The simplicity of Lloyd’s procedure enables us to apply it to a wide range of problems, including face detection, image segmentation, signal processing and many others [92]. Frackiewicz et al. [93] presented a color quantization method based on downsampling of the original image and k-means clustering on a downsampled image. The k-means clustering algorithm used in [94] was proposed for identifying electrical equipment of a smart building.
In many cases, researchers do not distinguish between the k-means model and the k-means algorithm, as Lloyd’s procedure is also called. Nevertheless, the result of Lloyd’s procedure may differ from the results of other more advanced algorithms many times in the objective function value (1). For finding a more accurate solution, a wide range of heuristic methods were proposed [55]: evolutionary and other bio-inspired algorithms, as well as local search in various neighborhoods.
Modern scientific literature offers many algorithms to speed up the solution of the k-means problem. Algorithm named k-indicators [95] promoted by Chen et al. is a semi-convex-relaxation algorithm for approximate solution of big-data clustering problems. In the distributed implementation of the k-means algorithm proposed in [96], the algorithm considers a set of agents, each of which is equipped with a possibly high-dimensional piece of information or set of measurements. In [97,98], the researchers improved algorithms for the data streams. In [99], Hedar et al. present a hierarchical k-means method for better clustering performance in the case of big data problems. This approach enables us to mitigate the poor scaling behavior with regard to computing time and memory requirements. Fast adaptive k-means subspace clustering algorithm with an adaptive loss function for high-dimensional data was proposed by Wang et al. [100]. Nevertheless, the usage of the massively parallel systems is the most efficient way to achieve the most significant acceleration of computations, and the original Lloyd’s procedure (Algorithm 1) can be seamlessly parallelized on such systems [101,102].
Metaheuristic approaches for the k-means and similar problems include genetic algorithms [46,103,104], the ant colony clustering hybrid algorithm proposed in [105], particle swarm optimization algorithms [106]. Almost all of these algorithms in one way or another use the Lloyd’s procedure or other local search procedures. Our new algorithm (solver) is not an exception.

2.2. Local Search in SWAP Neighborhoods

Local search algorithms differ in forms of neighborhood function n(S). A local minimum in one neighborhood may not be a local minimum in another neighborhood [50]. The choice of a neighborhood of lower cardinality leads to a decrease in the complexity of the search step, however, a wider neighborhood can lead to a better local minimum. We have to find a balance between these conflicting requirements [50].
A popular idea when solving k-means, k-medoids, k-median problems is to search for a better solution in SWAP neighborhoods. This idea was realized, for instance, in the J-means procedure [80] proposed by Hansen and Mladenovic, and similar I-means algorithm [107]. In SWAP neighborhoods, the set n(S) is the set of solutions obtained from S by replacing one or more centroids with some data vectors.
Let us denote the neighborhood, where r centroids must be simultaneously replaced, by SWAPr(S). The SWAPr neighborhood search can be regular (all possible substitutions are sequentially enumerated), as in the J-means algorithm, or randomized (centroids and data vectors for replacement are selected randomly). In both cases, the search in the SWAP neighborhood always alternates with the Lloyd’s procedure: if an improved solution is found in the SWAP neighborhood, the Lloyd’s procedure is applied to this new solution, and then the algorithm returns to the SWAP neighborhood search. Except for very small problems, regular search in SWAP neighborhoods, with the exception of the SWAP1 neighborhood and sometimes SWAP2, is almost never used due to the computational complexity: in each of iterations, all possible replacement options must be tested. A randomized search in SWAPr neighborhoods can be highly efficient for sufficiently large problems, which can be demonstrated by the experiment described below. Herewith, the correct choice of r is of great importance.
As can be seen on Figure 1, for various problems from the clustering benchmark repository [108,109], the best results are achieved with different values of r, although in general, such a search provides better results in comparison with Lloyd’s procedure. Our computational experiments are described in detail in Section 2.5, Section 2.6 and Section 2.7.

2.3. Agglomerative Approach and GREEDYr Neyborhoods

When solving the k-means and similar problems, the agglomerative approach is often successful. In [86], Sun et al. propose a parallel clustering method based on MapReduce model which implements the information bottleneck clustering (IBC) idea. In the IBC and other agglomerative clustering algorithms, clusters are sequentially removed one-by-one, and objects are redistributed among the remaining clusters. Alp et al. [88] presented a genetic algorithm for facility location problems, where evolution is facilitated by a greedy agglomerative heuristic procedure. A genetic algorithm with a faster greedy heuristic procedure for clustering and location problems was also proposed in [90]. In [46], two genetic algorithm approaches with different crossover procedures are used to solve k-median problem in continuous space.
Greedy agglomerative procedures can be used as independent algorithms, as well as being embedded into genetic operators [110] or VNS algorithms [79]. The basic greedy agglomerative procedure for the k-means problem can be described as follows (see Algorithm 2).
Algorithm 2.BasicGreedy(S)
Require: Set of initial centroids S = {X1, …, XK}, K > k, required final number of centroids k.
S L l o y d ( S ) ;
while |S| > k do
    for i = 1 , K ¯ do
       F i S S E ( S \ { X i } ) ;
    end for
    Select a subset S S of rtoremove centroids with the minimum values of the corresponding
    variables Fi; // By default, rtoremove = 1.
    S L l o y d ( S   \ S ) ;
end while.
In its most commonly used version, with rtoremove = 1, this procedure is rather slow for large-scale problems. It tries to remove the centroids one-by-one. At each iteration, it eliminates such centroids that their elimination results in the least significant increase in the SSE value. Further, this procedure involves the Lloyd’s procedure which can be also slow in the case of rather large problems with many clusters. To improve the performance of such a procedure, the number of simultaneously eliminated centroids can be calculated as r t o r e m o v e = max { 1 , ( | S | k ) r c o e f } . In [90], Kazakovtsev and Antamoshkin used the elimination coefficient value r c o e f   = 0.2. This means that at each iteration, up to 20% of the excessive centroids are eliminated, and such values are proved to make the algorithm faster. In this research, we use the same value.
In [79,90,110], the authors embed the BasicGreedy() procedure into three algorithms which differ in r value only. All of these algorithms can be described as follows (see Algorithm 3):
Algorithm 3.Greedy (S,S2,r)
Require: Two sets of centroids S, S2, |S| = |S2| = k, the number of centroids r of the solution S2 which are used to obtain the resulting solution, r { 1 , k ¯ } .
For i = 1 , n r e p e a t s ¯ do
1. Select a subset S S 2 :     | S | = r .
   2. S B a s i c G r e e d y ( S S ) ;
   3. if SSE(S’) < SSE(S) then S S end if;
end for
returnS.
Such procedures use various values of r from 1 up to k. If r = 1 then the algorithm selects a subset (actually, a single element) of S2 regularly: {X1} in the first iteration, {X2} in the second one, etc. In this case, nrepeats = k. If r = k then obviously S’ = S2, and nrepeat =1. Otherwise, r is selected randomly, r { 2 , k 1 ¯ } , and nrepeats depends on r: nrepeats = max{1, [ k / r ] }.
If the solution S2 is fixed, then all possible results of applying the Greedy(S,S2,r) procedure form a neighborhood of the solution S, and S2 as well as r are parameters of such a neighborhood. If S2 is a randomly chosen locally optimal solution obtained by Lloyd(S2’) procedure applied to a randomly chosen subset S 2 { A 1 , , A N } ,     | S 2 | = k , then we deal with a randomized neighborhood.
Let us denote such a neighborhood by GREEDYr(S). Our experiments in Section 3 demonstrate that the obtained result of the local search in GREEDYr neighborhoods strongly depends on r.

2.4. Variable Neighborhood Search

The dependence of the local search result on the neighborhood selection reduces if we use a certain set of neighborhoods and alternate them. This approach is the basis for VNS algorithms. The idea of alternating neighborhoods is easy to adapt to various problems [76,77,78] and highly efficient, which makes it very useful for solving NP-hard problems including clustering, location, and vehicle routing problems. In [111,112], Brimberg and Mladenovic and Miskovic et al. used the VNS for solving various facility location problems. Cranic et al. [113] as well as Hansen and Mladenovic [114] proposed and developed a parallel VNS algorithm for the k-median problem. In [115], a VNS algorithm was used for a vehicle routing and driver scheduling problems by Wen et al.
The ways of neighborhood alternation may differ significantly. Many VNS algorithms are not even classified by their authors as VNS algorithms. For example, the algorithm in [57] alternates between discrete and continuous problems: when solving a discrete problem, the set of local optima is replenished, and then such local optima are chosen as elements of the initial solution of the continuous problem. A similar idea of the recombinator k-means algorithm was proposed by C. Baldassi [116]. This algorithm restarts the k-means procedure, using the results of previous runs as a reservoir of candidates for the new initial solutions, exploiting the popular k-means++ seeding algorithm to piece them together into new, promising initial configurations. Thus, the k-means search alternates with the discrete problem of finding an optimal initial centroid combination.
VNS class includes a very efficient abovementioned J-Means algorithm [80], which alternates search in a SWAP neighborhood and the use of Lloyd’s procedure. Even when searching only in the SWAP1 neighborhood, the J-Means results can be many times better than the results of Lloyd’s procedure launched in the multi-start mode, as shown in [62,97].
In [50], Kochetov et al. describe such basic schemes of VNS algorithms as variable neighborhood descent (VND, see Algorithm 4) [117] and randomized Variable Neighborhood Search (RVNS, see Algorithm 5) [50].
Algorithm 4.VND(S)
Require: Initial solution S, selected neighborhoods nl, l = { 1 , l m a x ¯ } .
repeat
    l 1 ;
   while l l m a x do
     search for S n l ( S ) : f ( S ) = min { f ( Y ) | Y n l ( S ) } ;
     if f(S’) < f(S) then S S ; l 1 else l l + 1 end if;
   end while;
until the stop conditions are satisfied.
Algorithm 5.RVNS(S)
Require: Initial solution S, selected neighborhoods nl, l = { 1 , l m a x ¯ } .
repeat
    l 1 ;
   While l l m a x do
     select randomly S n l ( S ) ;
     if f(S’) < f(S) then S S ;   l 1 else l l + 1 end if;
   end while;
until the stop conditions are satisfied.
Algorithms of the RVNS scheme are more efficient when solving large-scale problems [50], when the use of deterministic VND requires too large computational costs per each iteration. In many efficient algorithms, lmax = 2. For example, the J-Means algorithm combines a SWAP neighborhood search with Lloyd’s procedure.
As a rule, algorithm developers propose to move from neighborhoods of lower cardinality to wider neighborhoods. For instance, in [79], the authors propose a sequential search in the neighborhoods GREEDY1 GREEDYrandom GREEDYk GREEDY1 … Here, GREEDYrandom is a neighborhood with randomly selected r { 2 , k 1 ¯ } . In this case, the initial neighborhood type has a strong influence on the result [79]. However, the best initial value of parameter r is hardly predictable.
In this article, we propose a new RVNS algorithm which involves GREEDYr neighborhood search with a gradually decreasing r and automatic adjustment of the initial r value. Computational experiments show the advantages of this algorithm in comparison with the algorithms searching in SWAP neighborhoods as well as in comparison with known search algorithms with GREEDYr neighborhoods.

2.5. New Algorithm

A search in a GREEDYr neighborhood with a fixed r values, on various practical problems listed in the repositories [108,109,118], shows that the result (the value of the objective function) essentially depends on r, and this dependence differs for various problems, even if the problems have similar basic numerical characteristics, such as the number of data vectors N, their dimension d, and the number of clusters k. The results are shown on Figure 2 and Figure 3. At the same time, our experiments show that at the first iterations, the use of Algorithm 3 almost always leads to an improvement in the SSE value, and then the probability of such a success decreases. Moreover, the search in neighborhoods with large r values stops giving improving results sooner, while the search in neighborhoods with small r, in particular, with r = 1, enables us to obtain the improved solutions during a longer time. The search in the GREEDY1 neighborhood corresponds to the adjustment of individual centroid positions. Thus, the possible decrement of the objective function value is not the same for different values of r.
We propose the following sequence of neighborhoods: GREEDYr0 GREEDYr1 GREEDYr2 GREEDY1 GREEDYk . Here, r values gradually decrease: r0 > r1 > r2…. After reaching r = 1, the search continues in the GREEDYk neighborhood, and after that the value of r starts decreasing again. Moreover, the r value fluctuates within certain limits at each stage of the search.
This algorithm can be described as follows (Algortithm 6).
Algorithm 6.DecreaseGreedySearch(S,r0)
Require: Initial solution S, initial r = r 0 { 1 , k ¯ } .
select randomly S 2 { A 1 ,   , A N } , | S 2 |= k; S 2 L l o y d ( S 2 ) ;
repeat
   nrepeats max{1, [ k / r ] };
   for i = 1 , n r e p e a t s ¯ do
     1. select randomly r {     m a x { 1 , [ r 0 2 ] } , r 0 ¯     } ;
     2. S G r e e d y ( S , S 2 , r ) ;
     3. if SSE(S’) < SSE(S) then S S end if;
   endfor;
select randomly S 2 { A 1 ,   , A N } , | S 2 | = k; S 2 L l o y d ( S 2 ) ;
   if Steps 1–3 have not changed S
   then
    if r = 1 then r 0 k else r 0 m a x { 1 , [ r 2 ] 1 } end if;
   end if;
   until the stop conditions are satisfied (time limitation).
Genetic algorithms with greedy agglomerative heuristics are known to perform better than VNS algorithms with sufficient computation time [79,90] which results in better SSE values. Despite this, the limited time and computational complexity of the Greedy() procedure as a genetic crossover operator leads to a situation when genetic algorithms may have enough time to complete a very limited number of crossover operations and often only reach the second or third generation of solutions. Under these conditions, VNS algorithms are a reasonable compromise of the computation cost and accuracy.
The choice of the initial value of parameter r0 is highly important. Such a choice is quite simply carried out by a reconnaissance search with different r0 values. The algorithm with such an automatic adjustment of the parameter r0 by performing a reconnaissance search is described as follows (Algorithm 7).
Algorithm 7.AdaptiveGreedy (S) solver
Require: the number of reconnaissance search iterations nrecon.
select randomly S { A 1 ,   , A N } , |S| = k; S L l o y d ( S ) ;
for i = 1 , n r e c o n ¯ do
   select randomly S i { A 1 ,   , A N } , | S i | = k; S i L l o y d ( S i ) ;
end for;
r k ;
repeat
    S r S ; nrepeats max{1, [ k / r ] };
   for i = 1 , n r e c o n ¯ do
     for i = 1 , n r e p e a t s ¯ do
         S G r e e d y ( S r , S i , r ) ; if SSE(S’) < SSE( S r ) then S r S end if;
     end for;
     end for;
     r m a x { 1 , [ r 2 ] 1 } ;
until r = 1   ;
select the value r with minimum value of SSE( S r );
r 0 min { 1.5 r , k } ;
DecreaseGreedySearch( S r , r 0 ).
Results of computational experiments described in the next Section show that our new algorithm, which sequentially decreases the value of the parameter r0, has an advantage over the known VNS algorithms.

2.6. CUDA Implementation

The greedy agglomerative procedure (BasicGreedy) is computationally expensive. In Algorithm 2, the objective function calculation Fi′SSE( S \ { X i } ) is performed more than (Kk) · k times in each iteration, and after that, Lloyd() procedure is executed. Therefore, such algorithms are traditionally considered as methods for solving comparatively small problems (hundreds of thousands of data points and hundreds of clusters). However, the rapid development of the massive parallel processing systems (GPUs) enables us to solve the large-scale problems with reasonable time expenses (seconds). Parallel (CUDA) implementation of the algorithms for the Lloyd() procedure is known [101,102], and we used this approach in our experiments.
Graphic processing units (GPUs) accelerate computations with the use of multi-core computing architecture. The CUDA (compute unified device architecture) is the most popular programming platform which enables us to use general-purpose programming languages (e.g., C++) for compiling GPU programs. The programming model uses the single instruction multiple thread (SIMT) principle [119]. We can declare a function in the CUDA program a “kernel” function and run this function on the steaming multiprocessors. The threads are divided into blocks. Several instances of a kernel function are executed in parallel on different nodes (blocks) of a computation grid. Each thread can be identified by special threadIdx variable. Each thread block is identified by blockIdx variable. The number of threads in a block is identified by blockDim variable. All these variables are 3-dimensional vectors (dimensions x, y, z). Depending on the problem solved, the interpretation of these dimensions may differ. For processing 2D graphical data, x and y are used for identifying pixel coordinates.
The most computationally expensive part of Lloyd’s procedure is distance computation and comparison (Step 1 of Algorithm 1). This step can be seamlessly parallelized if we calculate distances from each individual data vector in a separate thread. Thus, threadIdx.x and blockIdx.x must indicate a data vector. The same kernel function prepares data needed for centroid calculation (Step 2 of Algorithm 1). Such data are the sum of data vector coordinates in a specific cluster s u m j = i { 1 , N ¯ } : A i C j A i and the cardinality of the cluster counterj = | C j | . Here, j is the cluster number. Variable sumj is a vector (1-dimensional array in program realization).
To perform Step 1 of Algorithm 1 on a GPU, after initialization sumj 0 and counterj 0 , the following procedure (Algorithm 8) runs on ( N + b l o c k D i m . x )   /   b l o c k D i m . x nodes of computation grid, with blockDim.x threads in each block (in our experiments, blockDim.x = 512):
Algorithm 8.CUDA kernel implementation of Step 1 in Lloyd’s procedure (Algorithm 1)
i b l o c k I d x . x   · b l o c k D i m . x + t h r e a d I d x . x ;  
ifi > N then return end if;
Dnearest + ; // distance from Ai to the nearest centroid
for j = 1 , k ¯ do
   if A j X i < D n e a r e s t then
     Dnearest A j X i ;
     n j ;
   end if
end for;
sumn sumn + An;
countern countern + 1;
SSE SSE+ D n e a r e s t 2   . // objective function adder
If sumj and counterj are pre-calculated for each cluster then Step 2 of Algorithm 1 is reduced to a single arithmetic operation for each cluster: Xj = sumj/counterj. If the number of clusters is not huge, this operation does not take significant computation resources. Nevertheless, its parallel implementation is even simpler: we organize k treads, and each thread calculates Xj for an individual cluster. Outside Lloyd’s procedure, we use Algorithm 8 for SSE value estimation (variable SSE must be initialized by 0 in advance).
The second computationally expensive part of the BasicGreegy() algorithm is estimation of the objective function value after eliminating a centroid [120]: Fi′ = SSE( S \ { X i } ). Having calculated SSE(S), we may calculate as SSE( S \ { X i } ) as
F = S S E ( S \ { X i } ) = S S E ( S ) +   l = 1 N Δ D l
where
Δ D l = { 0 , A l C i , ( m i n j { 1 , k ¯ } ,   j i   X j A l ) 2 X i A l 2 , A l C i .
For calculating (5) on a GPU, after initializing Fi   S S E ( S ) , the following kernel function (Algorithm 9) runs for each data vector.
Algorithm 9.CUDA kernel implementation of calculating FiSSE( S \ { X i } ) in BasicGreedy procedure (Algorithm 2)
Require: index i of centroid being eliminated.
l b l o c k I d x . x   · b l o c k D i m . x + t h r e a d I d x . x ;
if l > N then return end if;
Dnearest + ; // distance from Al to the nearest centroid except Xi
for j = 1 , k ¯ do
   if l i     a n d   A j X i < D n e a r e s t then
     Dnearest A j X i ;
   end if
end for;
FiFi + D n e a r e s t 2 X i A l 2 ;
All distance calculations for GREEDYr neighborhood search are performed by Algorithms 8 and 9. A similar kernel function was used for accelerating the local search in SWAP neighborhoods. In this function, after eliminating a centroid, a data point is included in solution S as a new centroid.
All other parts of new and known algorithms were implemented on the CPU.

2.7. Benchmarking Data

In all our experiments, we used the classic data sets from the UCI Machine Learning and Clustering basic benchmark repositories [108,109,118]:
(a)
Individual household electric power consumption (IHEPC)—energy consumption data of households during several years (more than 2 million data vectors, 7 dimensions), 0–1 normalized data, “date” and “time” columns removed;
(b)
BIRCH3 [121]: one hundred of groups of points of random size on a plane (105 data vectors, 2 dimensions);
(c)
S1 data set: Gaussian clusters with cluster overlap (5000 data vectors, 2 dimensions);
(d)
Mopsi-Joensuu: geographic locations of users (6014 data vectors, 2 dimensions) in Joensuu city;
(e)
Mopsi-Finland: geographic locations of users (13,467 data vectors, 2 dimensions) in Finland.
Mopsi-Joensuu and Mopsi-Finland are “geographic” data sets with a complex cluster structure, formed under the influence of natural factors such as the geometry of the city, transport communications, and urban infrastructure (Figure 4).
In our study, we do not take into account the true labeling provided by the data set (if it is known), i.e., the given predictions for known classes, and focus on the minimization of SSE only.

2.8. Computational Environment

For our computational experiments, we used the following test system: Intel Core 2 Duo E8400 CPU, 16GB RAM, NVIDIA GeForce GTX1050ti GPU with 4096 MB RAM, floating-point performance 2138 GFLOPS. This choice of the GPU hardware was made due to its prevalence, and also one of the best values of the price/performance ratio. The program code was written in C++. We used Visual C++ 2017 compiler embedded into Visual Studio v.15.9.5, NVIDIA CUDA 10.0 Wizards, and NVIDIA Nsight Visual Studio Edition CUDA Support v.6.0.0.

3. Results

For all data sets, 30 attempts were made to run each of the algorithms (see Table 1 and Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10 and Table A11 in Appendix A).
For comparison, we ran local search in various GREEDYr neighborhoods at fixed r value. In addition, we ran various known Variable Neighborhood Search (VNS) algorithms with GREEDYr neighborhoods [79], see algorithms GH-VNS1-3. These algorithms use the same sequence of neighborhood types (GREEDY1→GREEDYrandom→GREEDYk) and differ in the initial neighborhood type: GREEDY1 for GH-VNS1, GREEDYrandom for GH-VNS2, and GREEDYk GH-VNS3. Unlike our new AdaptiveGreedy() algorithm, GH-VNS1-3 algorithms increase r values, and this increase is not gradual. In addition, we included the genetic algorithm (denoted “GA-1” in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10 and Table A11) with the single-point crossover [103], real-valued genes encoded by centroid positions, and the uniform random mutation (probability 0.01). For algorithms launched in the multi-start mode (j-Means algorithm and Lloyd’s procedure), only the best results achieved in each attempt were recorded. In Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10 and Table A11, such algorithms are denoted Lloyd-MS and j-Means-MS, respectively.
The minimum, maximum, average, and median objective function values and its standard deviation were summarized after 30 runs. For all algorithms, we used the same realization of the Lloyd() procedure which consume the absolute majority of the computation time.
The best average and median values of the objective function (1) are underlined. We compared the new AdaptiveGreedy() algorithm with the known algorithm which demonstrated the best median and average results (Table 1). For comparison, we used the t-test [122,123] and non-parametric Wilcoxon-Mann-Whitney U test (Wilcoxon rank sum test) [124,125] with z approximation.
To compare the results obtained by our new algorithm, we tested the single-tailed null hypothesis H0: SSEAdaptiveGreedy = SSEknown (the difference in the results is statistically insignificant) and the research hypothesis H1: SSEAdaptiveGreedy < SSEknown (statistically different results, the new algorithm has an advantage). Here, SSEAdaptiveGreedy are results ontained by AdaptiveGreedy() algorithm, SSEknown are results of the best-known algorithm. For t-test comparison, we selected the algorithm lowest in average SSE value, and for Wilcoxon–Mann–Whitney U test comparison, we selected the algorithm with the lowest SSE median value. For both tests, we calculated the p-values (probability of the null-hypothesis acceptance), see pt for the t-test and pu for the Wilcoxon–Mann–Whitney U test in Table 1, respectively. At the selected significance level psig = 0.01, the null hypothesis is accepted if pt > 0.01 or pU > 0.01. Otherwise, the difference in algorithm results should be considered statistically significant. If the null hypothesis was accepted, we also tested a pair of single-tailed hypotheses SSEAdaptiveGreedy = SSEknown and SSEAdaptiveGreedy > SSEknown.
In some cases, the Wilcoxon–Mann–Whitney test shows the statistical significance of the differences in results, while the t-test does not confirm the benefits of the new algorithm. Figure 5 illustrates such a situation. Both algorithms demonstrate approximately the same results. Both algorithms periodically produce results that are far from the best SSE values, which is expressed in a sufficiently large value of the standard deviation. However, the results of the new algorithm are often slightly better, which is confirmed by the rank test.
In the comparative analysis of algorithm efficiency, the choice of the unit of time plays an important role. The astronomical time spent by an algorithm strongly depends on its implementation, the ability of the compiler to optimize the program code, and the fitness of the hardware to execute the code of a specific algorithm. Algorithms are often estimated by comparing the number of iterations performed (for example, the number of population generations for a GA) or the number of evaluations of the objective function.
However, the time consumption for a single iteration of a local search algorithm depends on the neighborhood type and number of elements in the neighborhood, and this dependence can be exponential. Therefore, comparing the number of iterations is unacceptable. Comparison of the objective function calculations is also not quite correct. Firstly, the Lloyd() procedure which consumes almost all of the processor time, does not calculate the objective function (1) directly. Secondly, during the operation of the greedy agglomerative procedure, the number of centroids changes (decreases from k + r down to k), and the time spent on computing the objective function also varies. Therefore, we nevertheless chose astronomical time as a scale for comparing algorithms. Moreover, all the algorithms use the same implementation of the Lloyd() algorithm launched under the same conditions.
In our computational experiments, the time limitation was used as the stop condition for all algorithms. For all data sets except the largest one, we have chosen a reasonable time limit to use the new algorithm in interactive modes. For IHEPC data and 50 clusters, a single run of the BasicGreedy() algorithm on the specified hardware took approximately 0.05 to 0.5 s. It is impossible to evaluate the comparative efficiency of the new algorithm in several iterations, since in this case, it does not have enough time to change the neighborhood parameter r at least once. We have increased the time to a few minutes. This time limit does not correspond to modern concepts of interactive modes of operation. Nevertheless, the rapid development of parallel computing requires the early creation of efficient algorithmic schemes. Our experiments were performed on a mass-market system. Advanced systems may cope with such large problems much faster.
As can be seen from Figure 6, the result of each algorithm depends on the elapsed time. Nevertheless, an advantage of the new algorithm is evident regardless of the chosen time limit.
To test the scalability of the proposed approach and the efficiency of the new algorithm on other hardware, we carried out additional experiments with NVIDIA GeForce 9600GT GPU, 2048 MB RAM, 336 GFLOPS. The declared performance of this simpler equipment is approximately 6 times lower. The results of experiments with proportional increase of time limitation are shown in Table 2. The difference with the results in Table 1 is obviously insignificant.
The ranges of SSE values in the majority of Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8, Table A9, Table A10 and Table A11 are narrow, nevertheless, the differences are statistically significant in several cases, see Table 1. In all cases, our new algorithm outperforms known ones or demonstrates approximately the same efficiency (difference in the results is statistically insignificant). Moreover, the new algorithm demonstrates the stability of its results (narrow range of objective function values).
Search results in both SWAPr and GREEDYr neighborhoods depend on a correct choice of parameter r (the number of replaced or added centroids). However, in general, local search algorithms with GREEDYr neighborhoods outperform the SWAPr neighborhood search. A simple reconnaissance search procedure enables the further improvement of the efficiency.

4. Discussion

The advantages of our algorithm are statistically significant for a large problem (IHEPC data), as well as for problems with a complex data structure (Mopsi-Joensuu and Mopsi-Finland data). The Mopsi data sets contain geographic coordinates of Mopsi users, which are extremely unevenly distributed in accordance with the natural organization of the urban environment, depending on street directions and urban infrastructure (Figure 6). In this case, the aim of clustering is to find some natural groups of users according to a geometric/geographic principle for assigning them to k service centers (hubs) such as shopping centers, bus stops, wireless network base stations, etc.
Often, geographical data sets show such a disadvantage of Lloyd’s procedure as its inability to find a solution close to the exact one. Often, on such data, the value of the objective function found by the Lloyd’s procedure in the multi-start mode turns out to be many times greater than the values obtained by other algorithms, such as J-Means or RVNS algorithms with SWAP neighborhoods. As can be seen from Table A2, Table A3 and Table A5 in Appendix A, for such data, GREEDYrneighborhoods search provides significant advantages within a limited time, and our new self-adjusting AdaptiveGreedy() solver enhances these advantages.
The VNS algorithmic framework is useful for creating effective computational tools intended to solve complex practical problems. Embedding the most efficient types of neighborhoods in this framework depends on the problem type being solved. In problems such as k-means, the search in neighborhoods with specific parameters strongly depends not only on the generalized numerical parameters of the problems, such as the number of clusters, number of data vectors, and the search space dimensionality, but also on the internal data structure. In general, the comparative efficiency of the search in GREEDYr neighborhoods for certain types of practical problems and for specific data sets remains an open question. Nevertheless, the algorithm presented in this work, which automatically performs the adjustment of the most important parameter of such neighborhoods, enables its user to obtain the best result which the variable neighborhood search in GREEDYr is able to provide, without preliminary experiments in all possible GREEDYr neighborhoods. Thus, the new algorithm is a more versatile computational tool in comparison with the known VNS algorithms.
Greedy agglomerative procedures are widely used as crossover operators in genetic algorithms [46,88,90,110]. In this case, most often, the “parent” solutions are merged completely to obtain an intermediate solution with an excessive number of centers or centroids [46,88], which corresponds to the search in the GREEDYk neighborhood (one of the crossed “parent” solutions acts as the parameter S2), although, other versions of the greedy agglomerative crossover operator are also possible [90,110]. Such algorithms successfully compete with the advanced local search algorithms discussed in this article.
Self-configuring evolutionary algorithms [126,127,128] have been widely used for solving various optimization problems. An important direction of the further research is to study the possibility of adjusting the parameter r in greedy agglomerative crossover operators of genetic algorithms. Such procedures with self-adjusting parameter r could lead to a further increase in the accuracy of solving the k-means problem with respect to the achieved value of the objective function. Such evolutionary algorithms could also involve a reconnaissance search, which would then continue by applying the greedy agglomerative crossover operator with r values chosen from the most favorable range.
In addition, the similarity in problem statements of the k-means, k-medoids and k-median problems promises us a reasonable hope for the applicability of the same approaches to improving the accuracy of algorithms, including VNS algorithms, by adjusting the parameter r of the neighborhoods similar with GREEDYr.

5. Conclusions

The process of introducing machine learning methods into all spheres of life determines the need to develop not only fast, but also the most accurate algorithms for solving related optimization problems. As practice shows, including this study, when solving some problems, the most popular clustering algorithm gives a result extremely far from the optimal k-means problem solution.
In this research, we introduced GREEDYr search neighborhoods and found that searching in both SWAP and GREEDYr neighborhoods has advantages over the simplest Lloyd’s procedure. However, the results strongly depend on the parameters of such neighborhoods, and the optimal values of these parameters differ significantly for test problems. Nevertheless, searching in GREEDYr neighborhoods outperforms searching in SWAP neighborhoods in terms of accuracy.
We hope that our new variable neighborhood search algorithm (solver) for GPUs, which is more versatile due to its self-adjusting capability and has an advantage with respect to the accuracy of solving the k-means problem over known algorithms, will encourage researchers and practitioners in the field of machine learning to build competitive systems with the lowest possible error within a limited time. Such systems should be in demand when clustering geographic data, as well as when solving a wide range of problems with the highest cost of error.

Author Contributions

Conceptualization, L.K. and I.R.; methodology, L.K.; software, L.K.; validation, I.R. and E.T.; formal analysis, I.R. and A.P.; investigation, I.R.; resources, L.K. and E.T.; data curation, I.R.; writing—original draft preparation, L.K. and I.R.; writing—review and editing, L.K., E.T., and A.P.; visualization, I.R.; supervision, L.K.; project administration, L.K.; funding acquisition, L.K. and A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The Ministry of Science and Higher Education of the Russian Federation, project No. FEFE-2020-0013.

Conflicts of Interest

The authors declare no conflict of interest. The funder had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
NPNon-deterministic polynomial-time
MSSCMinimum Sum-of-Squares Clustering
SSESum of Squared Errors
ALA algorithmAlternate Location-Allocation algorithm
VNSVariable Neighborhood Search
GAGenetic Algorithm
IBCInformation Bottleneck Clustering
VNDVariable Neighborhood Descent
RVNSRandomized Variable Neighborhood Search
GPUGraphics Processing Unit
CPUCentral Processing Unit
RAMRandom Access Memory
CUDACompute Unified Device Architecture
IHEPCIndividual Household Electric Power Consumption
Lloyd-MSLloyd’s procedure in a multi-start mode
J-means-MSJ-Means algorithm in a multi-start mode (SWAP1+Lloyd VND)
GREEDYrA neighborhood formed by applying greedy agglomerative procedures with r excessive clusters, and the RVNS algorithm which combines search in such neighborhood with Lloyd’s procedure
SWAPrA neighborhood formed by replacing r centroids by data vectors, and the RVNS algorithm which combines search in such neighborhood with Lloyd’s procedure
GH-VNS1VNS algorithm with GREEDYr neighborhoods and GREEDY1 for the initial neighborhood type
GH-VNS2VNS algorithm with GREEDYr neighborhoods and GREEDYrandom for the initial neighborhood type
GH-VNS3VNS algorithm with GREEDYr neighborhoods and GREEDYk for the initial neighborhood type
GA-1Genetic algorithm with the single-point crossover, real-valued genes encoded by centroid positions, and the uniform random mutation
AdaptiveGreedyNew algorithm proposed in this article

Appendix A. Results of Computational Experiments

Table A1. Comparative results for Mopsi-Joensuu data set. 6014 data vectors in 2 , k = 30 clusters, time limitation 5 s.
Table A1. Comparative results for Mopsi-Joensuu data set. 6014 data vectors in 2 , k = 30 clusters, time limitation 5 s.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
Lloyd-MS35.571243.399339.118538.77182.9733
j-Means-MS18.407623.703220.339919.85331.8603
GREEDY118.325327.699021.455521.66293.1291
GREEDY218.325321.700819.377618.32541.6119
GREEDY318.314521.700718.581718.32540.9372
GREEDY518.325321.700718.512918.32540.7956
GREEDY718.325321.700818.566518.32550.9021
GREEDY1018.325321.701018.566618.32550.9021
GREEDY1218.325421.700918.585218.32560.9362
GREEDY1518.325418.325718.325518.32550.0001
GREEDY2018.325418.326318.325718.32570.0002
GREEDY2518.325418.325718.325518.32550.0001
GREEDY3018.325418.326118.325818.32580.0002
GH-VNS118.314718.325518.323818.32530.0039
GH-VNS218.325321.700819.377618.32541.6119
GH-VNS318.314621.680118.563418.32540.8971
SWAP1 (the best of SWAPr)18.908220.333019.408718.99670.6019
GA-118.647821.153119.955519.98770.6632
AdaptiveGreedy18.314618.325818.324018.32530.0037
Table A2. Comparative results for Mopsi-Joensuu data set. 6014 data vectors in 2 , k = 100 clusters, time limitation 5 s.
Table A2. Comparative results for Mopsi-Joensuu data set. 6014 data vectors in 2 , k = 100 clusters, time limitation 5 s.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
Lloyd-MS23.164134.783427.552027.13833.6436
j-Means-MS1.762831.896211.18322.421611.7961
GREEDY120.670135.544728.997029.24295.0432
GREEDY22.826429.06829.97085.33639.6186
GREEDY32.669010.59984.14443.05882.2108
GREEDY51.96114.31282.73852.72990.6135
GREEDY72.08374.64432.87302.63580.7431
GREEDY101.97783.86352.56132.33040.6126
GREEDY121.78174.30232.56392.20090.8730
GREEDY151.95643.15672.38842.24410.3620
GREEDY201.79373.28092.45422.35000.4746
GREEDY251.95323.38742.41952.25750.5470
GREEDY301.92742.45802.17232.14580.2171
GREEDY501.89039.36752.80472.16142.0838
GREEDY751.78782.88552.17752.02720.4023
GREEDY1001.80212.29422.01581.98490.1860
GH-VNS12.876317.11397.31964.33415.7333
GH-VNS22.826429.06829.97085.33639.6186
GH-VNS31.76432.73572.05131.98220.2699
SWAP3 (the best of rand. SWAPr)4.973923.65729.01598.39074.1351
GA-14.892219.15438.59147.17644.1096
AdaptiveGreedy1.77592.32651.95781.92290.1523
Table A3. Comparative results for Mopsi-Joensuu data set. 6014 data vectors in 2 , k = 300 clusters, time limitation 5 s.
Table A3. Comparative results for Mopsi-Joensuu data set. 6014 data vectors in 2 , k = 300 clusters, time limitation 5 s.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
Lloyd-MS4.178914.75709.11439.31193.0822
j-Means-MS7.011922.312614.277412.61995.5095
GREEDY17.165415.35009.61139.21762.5266
GREEDY24.989614.48398.91978.20133.3072
GREEDY35.896714.11108.32608.04412.2140
GREEDY52.911510.25365.80125.73052.2740
GREEDY72.60457.98684.42014.05481.4841
GREEDY102.54978.67584.17962.96391.8494
GREEDY122.07534.71343.03832.87770.8348
GREEDY151.89758.78903.86153.26611.8064
GREEDY201.18783.79442.45772.48820.9554
GREEDY251.16913.52991.84891.64070.7460
GREEDY301.11514.94252.37112.05821.1501
GREEDY501.35263.54711.86351.71140.6046
GREEDY751.05335.59151.91291.42611.2082
GREEDY1000.80472.03491.26021.19940.3811
GREEDY1500.62431.47550.87430.83010.2447
GREEDY2000.45551.01540.67460.58820.2103
GREEDY2500.47891.33680.72330.66950.2164
GREEDY3000.54741.04720.72280.66570.1419
GH-VNS11.62195.25283.04233.13321.0222
GH-VNS21.20738.61443.22282.35012.4014
GH-VNS30.43210.68380.60240.61390.0836
SWAP12 (the best of SWAP by median)2.60165.50383.62193.36121.0115
SWAP20 (the best of SWAP by avg.)2.16305.12353.49583.40760.8652
GA-15.491112.69508.87997.71812.5384
AdaptiveGreedy0.31280.63520.46720.46040.1026
Table A4. Comparative results for Mopsi-Finland data set. 13,467 data vectors in 2 , k = 30 clusters, time limitation 5 s.
Table A4. Comparative results for Mopsi-Finland data set. 13,467 data vectors in 2 , k = 30 clusters, time limitation 5 s.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
Lloyd-MS4.79217 × 10106.36078 × 10105.74896 × 10105.79836 × 10103.69760 × 109
j-Means-MS3.43535 × 10104.26830 × 10103.66069 × 10103.60666 × 10101.75725 × 109
GREEDY13.43195 × 10103.70609 × 10103.51052 × 10103.48431 × 10107.42 636 × 108
GREEDY23.43194 × 10103.49405 × 10103.44496 × 10103.44140 × 10101.64 360 × 108
GREEDY33.43195 × 10103.49411 × 10103.44474 × 10103.44140 × 10101.71131 × 108
GREEDY53.43195 × 10103.48411 × 10103.44663 × 10103.44141 × 10101.65153 × 108
GREEDY73.42531 × 10103.47610 × 10103.44091 × 10103.43504 × 10101.76023 × 108
GREEDY103.42560 × 10103.48824 × 10103.45106 × 10103.43573 × 10102.36526 × 108
GREEDY123.42606 × 10103.48822 × 10103.44507 × 10103.43901 × 10101.68986 × 108
GREEDY153.42931 × 10103.47817 × 10103.43874 × 10103.43901 × 10108.31510 × 107
GREEDY203.42954 × 10103.48826 × 10103.44186 × 10103.43905 × 10101.28972 × 108
GREEDY253.43877 × 10103.44951 × 10103.43982 × 10103.43907 × 10102.57320 × 107
GREEDY303.43900 × 10103.48967 × 10103.45169 × 10103.43979 × 10101.93565 × 108
GH-VNS13.42626 × 10103.48724 × 10103.45244 × 10103.44144 × 10102.00510 × 108
GH-VNS23.42528 × 10103.48723 × 10103.44086 × 10103.43474 × 10101.54771 × 108
GH-VNS33.42528 × 10103.47955 × 10103.43826 × 10103.43474 × 10101.02356 × 108
SWAP1 (the best of SWAPr)3.43199 × 10103.55777 × 10103.46821 × 10103.46056 × 10103.22711 × 108
GA-13.48343 × 10103.81846 × 10103.65004 × 10103.64415 × 10101.00523 × 109
AdaptiveGreedy3.42528 × 10103.47353 × 10103.43385 × 10103.43473 × 10101.03984 × 108
Table A5. Comparative results for Mopsi- Finland data set. 13,467 data vectors in 2 , k = 300 clusters, time limitation 5 s.
Table A5. Comparative results for Mopsi- Finland data set. 13,467 data vectors in 2 , k = 300 clusters, time limitation 5 s.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
Lloyd-MS5.41643 × 1096.89261 × 1096.25619 × 1096.24387 × 1093.23827 × 108
j-Means-MS6.75216 × 1081.38889 × 1098.92782 × 1088.35397 × 1081.86995 × 108
GREEDY14.08445 × 1099.07208 × 1095.89974 × 1095.59903 × 1091.47601 × 108
GREEDY21.11352 × 1092.10247 × 1091.59229 × 1091.69165 × 1092.89625 × 108
GREEDY39.63842 × 1082.15674 × 1091.61490 × 1091.60123 × 1093.06567 × 108
GREEDY59.11944 × 1082.36799 × 1091.66021 × 1091.70448 × 1093.68575 × 108
GREEDY71.17328 × 1092.44476 × 1091.77589 × 1091.80948 × 1092.68354 × 108
GREEDY101.14221 × 1092.00426 × 1091.67586 × 1091.69601 × 1092.14822 × 108
GREEDY129.41133 × 1082.28940 × 1091.59715 × 1091.62288 × 1093.01841 × 108
GREEDY158.86983 × 1082.29776 × 1091.53989 × 1091.43319 × 1093.70138 × 108
GREEDY201.02224 × 1092.11636 × 1091.62601 × 1091.64029 × 1092.45576 × 108
GREEDY259.07984 × 1081.87134 × 1091.42878 × 1091.42864 × 1092.74744 × 108
GREEDY308.44247 × 1082.22882 × 1091.50817 × 1091.56015 × 1093.52497 × 108
GREEDY507.98191 × 1081.68198 × 1091.26851 × 1091.17794 × 1092.67082 × 108
GREEDY756.97650 × 1081.74139 × 1091.16422 × 1091.16616 × 1092.82454 × 108
GREEDY1006.55465 × 1081.44162 × 1091.03643 × 1091.09001 × 1091.95246 × 108
GREEDY1505.94256 × 1081.45317 × 1098.88898 × 1087.96787 × 1082.33137 × 108
GREEDY2005.60885 × 1081.41411 × 1097.96908 × 1087.20282 × 1082.26191 × 108
GREEDY2505.58602 × 1081.13946 × 1097.58434 × 1086.81196 × 1081.65511 × 108
GREEDY3005.68646 × 1081.41338 × 1097.35067 × 1086.83004 × 1081.76126 × 108
GH-VNS11.40141 × 1092.86919 × 1092.16238 × 1092.10817 × 1093.42105 × 108
GH-VNS28.22679 × 1082.12228 × 1091.40322 × 1091.39457 × 1092.96599 × 108
GH-VNS35.33373 × 1087.29800 × 1085.74914 × 1085.48427 × 1085.05346 × 107
SWAP1 (the best of. SWAPr)6.69501 × 1089.06507 × 1087.48932 × 1087.35532 × 1086.74846 × 107
GA-14.54419 × 1097.11460 × 1095.67688 × 1095.61135 × 1095.99687 × 108
AdaptiveGreedy5.27254 × 1087.09410 × 1085.60867 × 1085.38952 × 1084.89257 × 107
Table A6. Comparative results for BIRCH3 data set. 105 data vectors in 2 , k = 100 clusters, time limitation 10 s.
Table A6. Comparative results for BIRCH3 data set. 105 data vectors in 2 , k = 100 clusters, time limitation 10 s.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
Lloyd-MS8.13022 × 10139.51129 × 10138.96327 × 10139.06147 × 10134.84194 × 1012
j-Means-MS4.14627 × 10136.25398 × 10134.78063 × 10134.55711 × 10136.89734 × 1012
GREEDY1 3.73299 × 10135.64559 × 10134.13352 × 10133.90845 × 10135.19021 × 1012
GREEDY2 3.71499 × 10133.72063 × 10133.71689 × 10133.71565 × 10132.44802 × 1010
GREEDY3 3.71518 × 10133.72643 × 10133.71840 × 10133.71545 × 10134.12818 × 1010
GREEDY5 3.71485 × 10133.72087 × 10133.71644 × 10133.71518 × 10132.22600 × 1010
GREEDY7 3.71518 × 10133.72267 × 10133.71755 × 10133.71658 × 10132.24845 × 1010
GREEDY10 3.71555 × 10133.72119 × 10133.71771 × 10133.71794 × 10131.90289 × 1010
GREEDY12 3.71556 × 10133.72954 × 10133.71892 × 10133.71693 × 10133.91673 × 1010
GREEDY15 3.71626 × 10133.72169 × 10133.71931 × 10133.71963 × 10131.86102 × 1010
GREEDY203.71600 × 10133.72638 × 10133.72118 × 10133.72153 × 10132.69206 × 1010
GREEDY253.72042 × 10133.72690 × 10133.72284 × 10133.72228 × 10132.14437 × 1010
GREEDY303.72180 × 10133.73554 × 10133.72586 × 10133.72471 × 10134.33818 × 1010
GREEDY503.72166 × 10133.76422 × 10133.73883 × 10133.73681 × 101316.1061 × 1010
GREEDY753.72399 × 10133.84870 × 10133.76286 × 10133.74750 × 101341.6632 × 1010
GREEDY1003.72530 × 10133.91589 × 10133.80730 × 10133.84482 × 101361.9706 × 1010
GH-VNS13.71914 × 10133.77527 × 10133.73186 × 10133.72562 × 101318.3590 × 1010
GH-VNS23.71568 × 10133.73791 × 10133.72116 × 10133.72051 × 10136.08081 × 1010
GH-VNS33.71619 × 10133.73487 × 10133.72387 × 10133.72282 × 10135.96618 × 1010
SWAP1 (the best of SWAPr)4.28705 × 10135.48014 × 10134.82383 × 10134.75120 × 10133.90128 × 1012
GA-13.84317 × 10134.08357 × 10133.97821 × 10133.97088 × 10137.43642 × 1011
AdaptiveGreedy3.71484 × 10133.72011 × 10133.71726 × 10133.71749 × 10132.02784 × 1010
Table A7. Comparative results for BIRCH3 data set. 105 data vectors in 2 , k = 300 clusters, time limitation 10 s.
Table A7. Comparative results for BIRCH3 data set. 105 data vectors in 2 , k = 300 clusters, time limitation 10 s.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
Lloyd-MS3.49605 × 10134.10899 × 10133.74773 × 10133.77191 × 10132.32012 × 1012
j-Means-MS1.58234 × 10132.02926 × 10131.75530 × 10131.70507 × 10131.43885 × 1012
GREEDY11.48735 × 10132.63695 × 10131.71372 × 10131.60354 × 10132.98555 × 1012
GREEDY21.31247 × 10131.45481 × 10131.37228 × 10131.36745 × 10134.01697 × 1011
GREEDY31.34995 × 10131.49226 × 10131.39925 × 10131.39752 × 10134.85917 × 1011
GREEDY51.33072 × 10131.45757 × 10131.39069 × 10131.38264 × 10134.46890 × 1011
GREEDY71.34959 × 10131.49669 × 10131.41606 × 10131.41764 × 10134.92200 × 1011
GREEDY101.31295 × 10131.42722 × 10131.35970 × 10131.35318 × 10133.70511 × 1011
GREEDY121.32677 × 10131.49028 × 10131.35561 × 10131.33940 × 10134.44283 × 1011
GREEDY151.32077 × 10131.41079 × 10131.34102 × 10131.33832 × 10132.16247 × 1011
GREEDY201.31994 × 10131.43160 × 10131.35420 × 10131.34096 × 10133.43684 × 1011
GREEDY251.31078 × 10131.37699 × 10131.33571 × 10131.33040 × 10132.16378 × 1011
GREEDY301.32947 × 10131.45967 × 10131.37618 × 10131.36729 × 10133.92767 × 1011
GREEDY501.32284 × 10131.38691 × 10131.34840 × 10131.33345 × 10132.70770 × 1011
GREEDY751.30808 × 10131.33266 × 10131.31857 × 10131.31833 × 10137.22941 × 1010
GREEDY1001.30852 × 10131.32697 × 10131.31250 × 10131.31067 × 10134.94315 × 1010
GREEDY1501.30754 × 10131.31446 × 10131.30971 × 10131.30952 × 10131.82873 × 1010
GREEDY2001.30773 × 10131.31172 × 10131.30916 × 10131.30912 × 10131.08001 × 1010
GREEDY2501.30699 × 10131.31073 × 10131.30944 × 10131.30990 × 10131.18367 × 1010
GREEDY3001.30684 × 10131.31068 × 10131.30917 × 10131.30933 × 10131.21748 × 1010
GH-VNS11.40452 × 10131.56256 × 10131.45212 × 10131.42545 × 101355.7231 × 1010
GH-VNS21.32287 × 10131.38727 × 10131,34654 × 10131,34568 × 10132,01065 × 1011
GH-VNS31.30996 × 10131.31378 × 10131.31158 × 10131.31138 × 10131.44998 × 1010
SWAP2 (the best of SWAPr by median)2.18532 × 10133.25705 × 10132.54268 × 10132.37312 × 10133.78491 × 1012
SWAP7 (the best of SWAPr by avg.)2.24957 × 10132.86883 × 10132.46775 × 10132.47301 × 10131.51198 × 1012
GA-11.38160 × 10131.71472 × 10131.55644 × 10131.54336 × 10139.21217 × 1011
AdaptiveGreedy1.30807 × 10131.31113 × 10131.30922 × 10131.30925 × 10130.87731 × 1010
Table A8. Comparative results for S1 data set. 5000 data vectors in 2 , k = 15 clusters, time limitation 1 s.
Table A8. Comparative results for S1 data set. 5000 data vectors in 2 , k = 15 clusters, time limitation 1 s.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
Lloyd-MS8.91703 × 10128.91707 × 10128.91704 × 10128.91703 × 10121.31098 × 107
j-Means-MS8.91703 × 101214.2907 × 101212.1154 × 101213.3667 × 10122.38947 × 1012
GREEDY18.91703 × 101213.2502 × 10129.27814 × 10128.91703 × 10121.25086 × 1012
GREEDY28.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10120.00000
GREEDY38.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10120.00000
GREEDY58.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10124.03023 × 105
GREEDY78.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10124.87232 × 105
GREEDY108.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10125.12234 × 105
GREEDY128.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10123.16158 × 105
GREEDY158.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10125.01968 × 105
GH-VNS18.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10120.00000
GH-VNS28.91703 × 10128,91703 × 10128,91703 × 10128.91703 × 10120.00000
GH-VNS38.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10124.03023 × 105
SWAP1 (the best of SWAP)8.91703 × 10128.91709 × 10128.91704 × 10128.91703 × 10128.67594 × 106
GA-18.91703 × 10128.91707 × 10128.91703 × 10128.91703 × 10129.04519 × 106
AdaptiveGreedy8.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10120.00000
Table A9. Comparative results for S1 data set. 5000 data vectors in 2 , k = 50 clusters, time limitation 1 s.
Table A9. Comparative results for S1 data set. 5000 data vectors in 2 , k = 50 clusters, time limitation 1 s.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
Lloyd-MS3.94212 × 10124.06133 × 10123.99806 × 10123.99730 × 10124.52976 × 1010
j-Means-MS3.96626 × 10124.40078 × 10124.12311 × 10124.07123 × 101214.81090 × 1010
GREEDY13.82369 × 10124.19102 × 10123.91601 × 10123.88108 × 10129.82433 × 1010
GREEDY23.74350 × 10123.76202 × 10123.75014 × 10123.74936 × 10126.10139 × 109
GREEDY33.74776 × 10123.76237 × 10123.75455 × 10123.75456 × 10125.24513 × 109
GREEDY53.74390 × 10123.77031 × 10123.75345 × 10123.75298 × 10127.17733 × 109
GREEDY73.74446 × 10123.77208 × 10123.75277 × 10123.75190 × 10127.40052 × 109
GREEDY103.74493 × 10123.76031 × 10123.75159 × 10123.75185 × 10125.26553 × 109
GREEDY153.74472 × 10123.77922 × 10123.75426 × 10123.75519 × 10129.79855 × 109
GREEDY203.75028 × 10123.76448 × 10123.75586 × 10123.75573 × 10123.97310 × 109
GREEDY253.74770 × 10123.76224 × 10123.75500 × 10123.75572 × 10124.95370 × 109
GREEDY303.75014 × 10123.76010 × 10123.75583 × 10123.75661 × 10123.45280 × 109
GREEDY503.74676 × 10123.77396 × 10123.76021 × 10123.75933 × 10129.09159 × 109
GH-VNS13.74310 × 10123.76674 × 10123.74911 × 10123.74580 × 10126.99859 × 109
GH-VNS23,75106 × 10123,77369 × 10123,75792 × 10123,75782 × 10126,67960 × 109
GH-VNS33.75923 × 10123.77964 × 10123.76722 × 10123.76812 × 10126.00125 × 109
SWAP3 (the best of SWAP)3.75128 × 10123.79170 × 10123.77853 × 10123.77214 × 10124.53608 × 109
GA-13.84979 × 10123.99291 × 10123.92266 × 10123.92818 × 10124.56845 × 1012
AdaptiveGreedy3.74340 × 10123.76313 × 10123.74851 × 10123.75037 × 10125.56298 × 109
Table A10. Comparative results for Individual Household Electric Power Consumption (IHEPC) data set. 2,075,259 data vectors in 7 , k = 15 clusters, time limitation 5 min.
Table A10. Comparative results for Individual Household Electric Power Consumption (IHEPC) data set. 2,075,259 data vectors in 7 , k = 15 clusters, time limitation 5 min.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AVERAGEMedianStd.dev
Lloyd-MS12,874.865212,880.070312,876.021912,874.86522.2952
j-Means-MS12,874.865213,118.645512,984.708112,962.132375.6539
all GREEDY1-15 (equal results)12,874.863312,874.863312,874.863312,874.86330.0000
GH-VNS112,874.863312,874.863312,874.863312,874.86330.0000
GH-VNS212,874.863312,874.863312,874.863312,874.86330.0000
GH-VNS312,874.863312,874.863312,874.863312,874.86330.0000
GA-112,874.864312,874.865212,874.864412,874.86430.0004
AdaptiveGreedy12,874.863312,874.863312,874.863312,874.86330.0000
Table A11. Comparative results for Individual Household Electric Power Consumption (IHEPC) data set. 2,075,259 data vectors in 7 , k = 50 clusters, time limitation 5 min.
Table A11. Comparative results for Individual Household Electric Power Consumption (IHEPC) data set. 2,075,259 data vectors in 7 , k = 50 clusters, time limitation 5 min.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
Lloyd-MS5605.06255751.19825671.08205660.442954.2467
j-Means-MS5160.27006280.64405496.65395203.5679493.7311
GREEDY15200.92685431.36475287.41015281.730077.0460
GREEDY25167.14825283.38945171.65095192.12747.7203
GREEDY35155.51665178.40635166.53605164.60458.1580
GREEDY55164.60405178.43365170.88295174.09386.0904
GREEDY75162.53815178.12695168.72185171.82926.4518
GREEDY105154.20175176.45025162.04605160.40147.2029
GREEDY125162.87155181.02815166.89525165.32956.0172
GREEDY155163.25005181.13335167.33855165.80375.7910
GREEDY205156.28525176.68555166.20135164.63237.8749
GREEDY255166.98205181.85295175.03175176.21366.1471
GREEDY305168.63095182.43515175.24145176.45126.4635
GREEDY505168.38875182.43215177.52495177.68555.4437
GH-VNS15155.51665164.63135158.65495157.68123.7467
GH-VNS25159.88185176.68555167.33655166.95125.6808
GH-VNS35171.29695182.43215175.04685174.07523.6942
GA-15215.95215248.45215230.28395226.038613.2694
AdaptiveGreedy5153.56405163.93165157.08225155.51983.6034

References

  1. Berkhin, P. Survey of Clustering Data Mining Techniques; Accrue Software: New York, NY, USA, 2002. [Google Scholar]
  2. Cormack, R.M. A Review of Classification. J. R. Stat. Soc. Ser. A 1971, 134, 321–367. [Google Scholar] [CrossRef]
  3. Tsai, C.Y.; Chiu, C.C. A VNS-based hierarchical clustering method. In Proceedings of the 5th WSEAS International Conference on Computational Intelligence, Man-Machine Systems and Cybernetics (CIMMACS’06), Venice, Italy, 20–22 November 2006; World Scientific and Engineering Academy and Society (WSEAS): Stevens Point, WI, USA, 2006; pp. 268–275. [Google Scholar]
  4. Lloyd, S.P. Least Squares Quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
  5. MacQueen, J.B. Some Methods of Classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1965 and 27 December 1965–7 January 1966; Volume 1, pp. 281–297. [Google Scholar]
  6. Drineas, P.; Frieze, A.; Kannan, R.; Vempala, S.; Vinay, V. Clustering large graphs via the singular value decomposition. Mach. Learn. 2004, 56, 9–33. [Google Scholar] [CrossRef] [Green Version]
  7. Gu, Y.; Li, K.; Guo, Z.; Wang, Y. Semi-supervised k-means ddos detection method using hybrid feature selection algorithm. IEEE Access 2019, 7, 351–365. [Google Scholar] [CrossRef]
  8. Guo, X.; Zhang, X.; He, Y.; Jin, Y.; Qin, H.; Azhar, M.; Huang, J.Z. A Robust k-Means Clustering Algorithm Based on Observation Point Mechanism. Complexity 2020, 2020, 3650926. [Google Scholar] [CrossRef]
  9. Milligan, G.W. Clustering validation: Results and implications for applied analyses. In Clustering and Classification; Arabie, P., Hubert, L.J., Soete, G., Eds.; World Scientific: River Edge, NJ, USA, 1996; pp. 341–375. [Google Scholar]
  10. Steinley, D.; Brusco, M. Choosing the Number of Clusters in K-Means Clustering. Psychol. Methods 2011, 16, 285–297. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Garey, M.; Johnson, D.; Witsenhausen, H. The complexity of the generalized Lloyd—Max problem (Corresp. ) IEEE Trans. Inf. Theory 1982, 28, 255–256. [Google Scholar] [CrossRef]
  12. Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P. NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 2009, 75, 245–248. [Google Scholar] [CrossRef] [Green Version]
  13. Cooper, L. Heuristic methods for location-allocation problems. SIAM Rev. 1964, 6, 37–53. [Google Scholar] [CrossRef]
  14. Jiang, J.L.; Yuan, X.M. A heuristic algorithm for constrained multi-source Weber problem. The variational inequality approach. Eur. J. Oper. Res. 2007, 187, 357–370. [Google Scholar] [CrossRef]
  15. Arthur, D.; Manthey, B.; Roglin, H. k-Means Has Polynomial Smoothed Complexity. In Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS’09), Atlanta, GA, USA, 25–27 October 2009; IEEE Computer Society: Washington, DC, USA, 2009; pp. 405–414. [Google Scholar] [CrossRef] [Green Version]
  16. Sabin, M.J.; Gray, R.M. Global convergence and empirical consistency of the generalized Lloyd algorithm. IEEE Trans. Inf. Theory 1986, 32, 148–155. [Google Scholar] [CrossRef]
  17. Emelianenko, M.; Ju, L.; Rand, A. Nondegeneracy and Weak Global Convergence of the Lloyd Algorithm in Rd. SIAM J. Numer. Anal. 2009, 46, 1423–1441. [Google Scholar] [CrossRef]
  18. Pham, D.T.; Afify, A.A. Clustering techniques and their applications in engineering. Proceedings of the Institution of Mechanical Engineers, Part C. J. Mech. Eng. Sci. 2007, 221, 1445–1459. [Google Scholar] [CrossRef]
  19. Fisher, D.; Xu, L.; Carnes, J.R.; Reich, Y.; Fenves, J.; Chen, J.; Shiavi, R.; Biswas, G.; Weinberg, J. Applying AI clustering to engineering tasks. IEEE Expert 1993, 8, 51–60. [Google Scholar] [CrossRef]
  20. Gheorghe, G.; Cartina, G.; Rotaru, F. Using K-Means Clustering Method in Determination of the Energy Losses Levels from Electric Distribution Systems. In Proceedings of the International Conference on Mathematical Methods and Computational Techniques in Electrical Engineering, Timisoara, Romania, 21–23 October 2010; pp. 52–56. [Google Scholar]
  21. Kersten, P.R.; Lee, J.S.; Ainsworth, T.L. Unsupervised classification of polarimetric synthetic aperture radar images using fuzzy clustering and EM clustering. IEEE Trans. Geosci. Remote Sens. 2005, 43, 519–527. [Google Scholar] [CrossRef]
  22. Cesarotti, V.; Rossi, L.; Santoro, R. A neural network clustering model for miscellaneous components production planning. Prod. Plan. Control 1999, 10, 305–316. [Google Scholar] [CrossRef]
  23. Kundu, B.; White, K.P., Jr.; Mastrangelo, C. Defect clustering and classification for semiconductor devices. In Proceedings of the 45th Midwest Symposium on Circuits and Systems, Tulsa, Oklahoma, 4–7 August 2002; Volume 2, pp. II-561–II-564. [Google Scholar] [CrossRef]
  24. Vernet, A.; Kopp, G.A. Classification of turbulent flow patterns with fuzzy clustering. Eng. Appl. Artif. Intell. 2002, 15, 315–326. [Google Scholar] [CrossRef]
  25. Afify, A.A.; Dimov, S.; Naim, M.M.; Valeva, V. Detecting cyclical disturbances in supply networks using data mining techniques. In Proceedings of the 2nd European Conference on Management of Technology, Birmingham, UK, 10–12 September 2006; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
  26. Jain, A.K.; Murty, M.N.; Flynn, P.J. Data clustering: A review. ACM Comput. Surv. 1999, 31, 264–323. [Google Scholar] [CrossRef]
  27. Naranjo, J.E.; Saha, R.; Tariq, M.T.; Hadi, M.; Xiao, Y. Pattern Recognition Using Clustering Analysis to Support Transportation System Management, Operations, and Modeling. J. Adv. Transp. 2019. [Google Scholar] [CrossRef]
  28. Kadir, R.A.; Shima, Y.; Sulaiman, R.; Ali, F. Clustering of public transport operation using K-means. In Proceedings of the 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), Shanghai, China, 9–12 March 2018; pp. 427–532. [Google Scholar]
  29. Sesham, A.; Padmanabham, P.; Govardhan, A. Application of Factor Analysis to k-means Clustering Algorithm on Transportation Data. IJCA 2014, 95, 40–46. [Google Scholar] [CrossRef]
  30. Deb Nath, R.P.; Lee, H.J.; Chowdhury, N.K.; Chang, J.W. Modified K-Means Clustering for Travel Time Prediction Based on Historical Traffic Data. LNCS 2010, 6276, 511–521. [Google Scholar] [CrossRef]
  31. Montazeri-Gh, M.; Fotouhi, A. Traffic condition recognition using the k-means clustering method. Sci. Iran. 2011, 18, 930–937. [Google Scholar] [CrossRef] [Green Version]
  32. Farahani, R.Z.; Hekmatfar, M. Facility Location Concepts, Models, Algorithms and Case Studies; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar] [CrossRef]
  33. Drezner, Z.; Hamacher, H. Facility Location: Applications and Theory; Springer: Berlin, Germany, 2004; pp. 119–143. [Google Scholar]
  34. Klastorin, T.D. The p-Median Problem for Cluster Analysis: A Comparative Test Using the Mixture Model Approach. Manag. Sci. 1985, 31, 84–95. [Google Scholar] [CrossRef]
  35. Brusco, M.J.; Kohn, H.F. Optimal Partitioning of a Data Set Based on the p-Median Model. Psychometrica 2008, 73, 89–105. [Google Scholar] [CrossRef]
  36. Kaufman, L.; Rousseeuw, P.J. Clustering by means of Medoids. In Statistical Data Analysis Based on the L1–Norm and Related Methods; Dodge, Y., Ed.; Birkhäuser Basel: Basel, Switzerland, 1987; pp. 405–416. [Google Scholar]
  37. Schubert, E.; Rousseeuw, P. Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms. arXiv 2019, arXiv:1810.05691. [Google Scholar]
  38. Park, H.S.; Jun, C.H. A simple and fast algorithm for K-medoids clustering. Expert Syst. Appl. 2009, 36, 3336–3341. [Google Scholar] [CrossRef]
  39. Hakimi, S.L. Optimum Locations of Switching Centers and the Absolute Centers and Medians of a Graph. Oper. Res. 1964, 12, 450–459. [Google Scholar] [CrossRef]
  40. Masuyama, S.; Ibaraki, T.; Hasegawa, T. The Computational Complexity of the m-Center Problems on the Plane. Trans. Inst. Electron. Commun. Eng. Japan 1981, 64E, 57–64. [Google Scholar]
  41. Kariv, O.; Hakimi, S.L. An Algorithmic Approach to Network Location Problems. II: The P medians. SIAM J. Appl. Math. 1979, 37, 539–560. [Google Scholar] [CrossRef]
  42. Kuenne, R.E.; Soland, R.M. Exact and approximate solutions to the multisource Weber problem. Math. Program. 1972, 3, 193–209. [Google Scholar] [CrossRef]
  43. Ostresh, L.M., Jr. The Stepwise LocationAllocation Problem: Exact Solutions in Continuous and Discrete Spaces. Geogr. Anal. 1978, 10, 174–185. [Google Scholar] [CrossRef]
  44. Rosing, K.E. An optimal method for solving the (generalized) multi-Weber problem. Eur. J. Oper. Res. 1992, 58, 414–426. [Google Scholar] [CrossRef]
  45. Blum, C.; Roli, A. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Comput. Surv. 2001, 35, 268–308. [Google Scholar] [CrossRef]
  46. Neema, M.N.; Maniruzzaman, K.M.; Ohgai, A. New Genetic Algorithms Based Approaches to Continuous p-Median Problem. Netw. Spat. Econ. 2011, 11, 83–99. [Google Scholar] [CrossRef]
  47. Hoos, H.H.; Stutzle, T. Stochastic Local Search Foundations and Applications; Springer: Berlin, Germany, 2005. [Google Scholar]
  48. Bang-Jensen, J.; Chiarandini, M.; Goegebeur, Y.; Jorgensen, B. Mixed Models for the Analysis of Local Search Components. In Proceedings of the Engineering Stochastic Local Search Algorithms International Workshop, Brussels, Belgium, 6–8 September 2007; pp. 91–105. [Google Scholar]
  49. Cohen-Addad, V.; Mathieu, C. Effectiveness of local search for geometric optimization. In Proceedings of the 31st International Symposium on Computational Geometry, SoCG-2015, Eindhoven, The Netherlands, 22–25 June 2015; pp. 329–343. [Google Scholar]
  50. Kochetov, Y.; Mladenović, N.; Hansen, P. Local search with alternating neighborhoods. Discret. Anal. Oper. Res. 2003, 2, 11–43. (In Russian) [Google Scholar]
  51. Kanungo, T.; Mount, D.M.; Netanyahu, N.S.; Piatko, C.D.; Silverman, R.; Wu, A.Y. A local search approximation algorithm for k-means clustering. Comput. Geom. Theory Appl. 2004, 28, 89–112. [Google Scholar] [CrossRef]
  52. Page, E.S. On Monte Carlo methods in congestion problems. I: Searching for an optimum in discrete situations. Oper. Res. 1965, 13, 291–299. [Google Scholar] [CrossRef]
  53. Hromkovic, J. Algorithmics for Hard Problems: Introduction to Combinatorial Optimization, Randomization, Approximation, and Heuristics; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  54. Ng, T. Expanding Neighborhood Tabu Search for facility location problems in water infrastructure planning. In Proceedings of the 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), San Diego, CA, USA, 5–8 October 2014; pp. 3851–3854. [Google Scholar] [CrossRef]
  55. Mladenovic, N.; Brimberg, J.; Hansen, P.; Moreno-Perez, J.A. The p-median problem: A survey of metaheuristic approaches. Eur. J. Oper. Res. 2007, 179, 927–939. [Google Scholar] [CrossRef] [Green Version]
  56. Reese, J. Solution methods for the p-median problem: An annotated bibliography. Networks 2006, 48, 125–142. [Google Scholar] [CrossRef]
  57. Brimberg, J.; Drezner, Z.; Mladenovic, N.; Salhi, S. A New Local Search for Continuous Location Problems. Eur. J. Oper. Res. 2014, 232, 256–265. [Google Scholar] [CrossRef] [Green Version]
  58. Drezner, Z.; Brimberg, J.; Mladenovic, N.; Salhi, S. New heuristic algorithms for solving the planar p-median problem. Comput. Oper. Res. 2015, 62, 296–304. [Google Scholar] [CrossRef]
  59. Drezner, Z.; Brimberg, J.; Mladenovic, N.; Salhi, S. Solving the planar p-median problem by variable neighborhood and concentric searches. J. Glob. Optim. 2015, 63, 501–514. [Google Scholar] [CrossRef] [Green Version]
  60. Arthur, D.; Vassilvitskii, S. k-Means++: The Advantages of Careful Seeding. In Proceedings of the SODA’07, SIAM, New Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
  61. Bradley, P.S.; Fayyad, U.M. Refining Initial Points for K-Means Clustering. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML ‘98), Madison, WI, USA, 24–27 July 1998; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1998; pp. 91–99. [Google Scholar]
  62. Bhusare, B.B.; Bansode, S.M. Centroids Initialization for K-Means Clustering using Improved Pillar Algorithm. Int. J. Adv. Res. Comput. Eng. Technol. 2014, 3, 1317–1322. [Google Scholar]
  63. Yang, J.; Wang, J. Tag clustering algorithm lmmsk: Improved k-means algorithm based on latent semantic analysis. J. Syst. Electron. 2017, 28, 374–384. [Google Scholar]
  64. Mishra, N.; Oblinger, D.; Pitt, L. Sublinear time approximate clustering. In Proceedings of the 12th SODA, Washington, DC, USA, 7–9 January 2001; pp. 439–447. [Google Scholar]
  65. Eisenbrand, F.; Grandoni, F.; Rothvosz, T.; Schafer, G. Approximating connected facility location problems via random facility sampling and core detouring. In Proceedings of the SODA’2008, San Francisco, CA, USA, 20–22 January 2008; ACM: New York, NY, USA, 2008; pp. 1174–1183. [Google Scholar] [CrossRef]
  66. Jaiswal, R.A.; Kumar, A.; Sen, S. Simple D2-Sampling Based PTAS for k-Means and Other Clustering Problems. Algorithmica 2014, 70, 22–46. [Google Scholar] [CrossRef] [Green Version]
  67. Avella, P.; Boccia, M.; Salerno, S.; Vasilyev, I. An Aggregation Heuristic for Large Scale p-median Problem. Comput. Oper. Res. 2012, 39, 1625–1632. [Google Scholar] [CrossRef]
  68. Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; Wiley: New York, NY, USA, 1990. [Google Scholar]
  69. Francis, R.L.; Lowe, T.J.; Rayco, M.B.; Tamir, A. Aggregation error for location models: Survey and analysis. Ann. Oper. Res. 2009, 167, 171–208. [Google Scholar] [CrossRef]
  70. Pelleg, D.; Moore, A. Accelerating Exact k-Means with Geometric Reasoning [Technical Report CMU-CS-00-105]; Carnegie Melon University: Pittsburgh, PA, USA, 2000. [Google Scholar]
  71. Borgelt, C. Even Faster Exact k-Means Clustering. LNCS 2020, 12080, 93–105. [Google Scholar] [CrossRef] [Green Version]
  72. Lai, J.Z.C.; Huang, T.J.; Liaw, Y.C. A Fast k-Means Clustering Algorithm Using Cluster Center Displacement. Pattern Recognit. 2009, 42, 2551–2556. [Google Scholar] [CrossRef]
  73. Mladenovic, N.; Hansen, P. Variable Neighborhood Search. Comput. Oper. Res. 1997, 24, 1097–1100. [Google Scholar] [CrossRef]
  74. Hansen, P. Variable Neighborhood Search. Search Methodology. In Search Metodologies; Bruke, E.K., Kendall, G., Eds.; Springer: New York, NY, USA, 2005; pp. 211–238. [Google Scholar] [CrossRef] [Green Version]
  75. Hansen, P.; Mladenovic, N. Variable Neighborhood Search. In Handbook of Heuristics; Martí, R., Pardalos, P., Resende, M., Eds.; Springer: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
  76. Brimberg, J.; Hansen, P.; Mladenovic, N. Attraction Probabilities in Variable Neighborhood Search. 4OR-Q. J. Oper. Res 2010, 8, 181–194. [Google Scholar] [CrossRef]
  77. Hansen, P.; Mladenovic, N.; Perez, J.A.M. Variable Neighborhood Search: Methods and Applications. 4OR-Q. J. Oper. Res. 2008, 6, 319–360. [Google Scholar] [CrossRef]
  78. Hansen, P.; Brimberg, J.; Urosevic, D.; Mladenovic, N. Solving Large p-Median Clustering Problems by Primal Dual Variable Neighborhood Search. Data Min. Knowl. Discov. 2009, 19, 351–375. [Google Scholar] [CrossRef]
  79. Rozhnov, I.P.; Orlov, V.I.; Kazakovtsev, L.A. VNS-Based Algorithms for the Centroid-Based Clustering Problem. Facta Univ. Ser. Math. Inform. 2019, 34, 957–972. [Google Scholar]
  80. Hansen, P.; Mladenovic, N. J-Means: A new local search heuristic for minimum sum-of-squares clustering. Pattern Recognit. 2001, 34, 405–413. [Google Scholar] [CrossRef]
  81. Martins, P. Goal Clustering: VNS Based Heuristics. Available online: https://arxiv.org/abs/1705.07666v4 (accessed on 24 October 2020).
  82. Carrizosa, E.; Mladenovic, N.; Todosijevic, R. Variable neighborhood search for minimum sum-of-squares clustering on networks. Eur. J. Oper. Res. 2013, 230, 356–363. [Google Scholar] [CrossRef]
  83. Roux, M. A Comparative Study of Divisive and Agglomerative Hierarchical Clustering Algorithms. J. Classif. 2018, 35, 345–366. [Google Scholar] [CrossRef] [Green Version]
  84. Sharma, A.; López, Y.; Tsunoda, T. Divisive hierarchical maximum likelihood clustering. BMC Bioinform. 2017, 18, 546. [Google Scholar] [CrossRef]
  85. Venkat Reddy, M.; Vivekananda, M.; Satish, R.U.V.N. Divisive Hierarchical Clustering with K-means and Agglomerative Hierarchical Clustering. IJCST 2017, 5, 6–11. [Google Scholar]
  86. Sun, Z.; Fox, G.; Gu, W.; Li, Z. A parallel clustering method combined information bottleneck theory and centroid-based clustering. J. Supercomput. 2014, 69, 452–467. [Google Scholar] [CrossRef]
  87. Kuehn, A.A.; Hamburger, M.J. A heuristic program for locating warehouses. Manag. Sci. 1963, 9, 643–666. [Google Scholar] [CrossRef]
  88. Alp, O.; Erkut, E.; Drezner, Z. An Efficient Genetic Algorithm for the p-Median Problem. Ann. Oper. Res. 2003, 122, 21–42. [Google Scholar] [CrossRef]
  89. Cheng, J.; Chen, X.; Yang, H.; Leng, M. An enhanced k-means algorithm using agglomerative hierarchical clustering strategy. In Proceedings of the International Conference on Automatic Control and Artificial Intelligence (ACAI 2012), Xiamen, China, 3–5 March 2012; pp. 407–410. [Google Scholar] [CrossRef]
  90. Kazakovtsev, L.A.; Antamoshkin, A.N. Genetic Algorithm with Fast Greedy Heuristic for Clustering and Location Problems. Informatica 2014, 3, 229–240. [Google Scholar]
  91. Pelleg, D.; Moore, A. X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In Proceedings of the International Conference on Machine Learning ICML, Sydney, Australia, 8–12 July 2002. [Google Scholar]
  92. Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
  93. Frackiewicz, M.; Mandrella, A.; Palus, H. Fast Color Quantization by K-Means Clustering Combined with Image Sampling. Symmetry 2019, 11, 963. [Google Scholar] [CrossRef] [Green Version]
  94. Zhang, G.; Li, Y.; Deng, X. K-Means Clustering-Based Electrical Equipment Identification for Smart Building Application. Information 2020, 11, 27. [Google Scholar] [CrossRef] [Green Version]
  95. Chen, F.; Yang, Y.; Xu, L.; Zhang, T.; Zhang, Y. Big-Data Clustering: K-Means or K-Indicators? 2019. Available online: https://arxiv.org/pdf/1906.00938.pdf (accessed on 18 October 2020).
  96. Qin, J.; Fu, W.; Gao, H.; Zheng, W.X. Distributed k-means algorithm and fuzzy c -means algorithm for sensor networks based on multiagent consensus theory. IEEE Trans. Cybern. 2016, 47, 772–783. [Google Scholar] [CrossRef]
  97. Shindler, M.; Wong, A.; Meyerson, A. Fast and accurate k-means for large datasets. In Proceedings of the 24th International Conference on Neural Information Processing Systems (NIPS’11), Sydney, Australia, 13–16 December 2011; Curran Associates Inc.: Red Hook, NY, USA, 2011; pp. 2375–2383. [Google Scholar]
  98. Hedar, A.R.; Ibrahim, A.M.M.; Abdel-Hakim, A.E.; Sewisy, A.A. K-Means Cloning: Adaptive Spherical K-Means Clustering. Algorithms 2018, 11, 151. [Google Scholar] [CrossRef] [Green Version]
  99. Xu, T.S.; Chiang, H.D.; Liu, G.Y.; Tan, C.W. Hierarchical k-means method for clustering large-scale advanced metering infrastructure data. IEEE Trans. Power Deliv. 2015, 32, 609–616. [Google Scholar] [CrossRef]
  100. Wang, X.D.; Chen, R.C.; Yan, F.; Zeng, Z.Q.; Hong, C.Q. Fast adaptive k-means subspace clustering for high-dimensional data. IEEE Access 2019, 7, 639–651. [Google Scholar] [CrossRef]
  101. Zechner, M.; Granitzer, M. Accelerating K-Means on the Graphics Processor via CUDA. In Proceedings of the International Conference on Intensive Applications and Services, Valencia, Spain, 20–25 April 2009; pp. 7–15. [Google Scholar] [CrossRef] [Green Version]
  102. Luebke, D.; Humphreys, G. How GPUs work. Computer 2007, 40, 96–110. [Google Scholar] [CrossRef]
  103. Maulik, U.; Bandyopadhyay, S. Genetic Algorithm-Based Clustering Technique. Pattern Recognit. 2000, 33, 1455–1465. [Google Scholar] [CrossRef]
  104. Krishna, K.; Murty, M. Genetic K-Means algorithm. IEEE Trans. Syst. Man Cybern. Part B 1999, 29, 433–439. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  105. Singh, N.; Singh, D.P.; Pant, B. ACOCA: Ant Colony Optimization Based Clustering Algorithm for Big Data Preprocessing. Int. J. Math. Eng. Manag. Sci. 2019, 4, 1239–1250. [Google Scholar] [CrossRef]
  106. Merwe, D.W.; Engelbrecht, A.P. Data Clustering Using Particle Swarm Optimization. In Proceedings of the 2003 Congress on Evolutionary Computation, Canberra, Australia, 8–12 December 2003; pp. 215–220. [Google Scholar]
  107. Nikolaev, A.; Mladenovic, N.; Todosijevic, R. J-means and I-means for minimum sum-of-squares clustering on networks. Optim. Lett. 2017, 11, 359–376. [Google Scholar] [CrossRef]
  108. Fränti, P.; Sieranoja, S. K-means properties on six clustering benchmark datasets. Appl. Intell. 2018, 48, 4743–4759. [Google Scholar] [CrossRef]
  109. Clustering Basic Benchmark. Available online: http://cs.joensuu.fi/sipu/datasets/ (accessed on 15 September 2020).
  110. Kazakovtsev, L.; Shkaberina, G.; Rozhnov, I.; Li, R.; Kazakovtsev, V. Genetic Algorithms with the Crossover-Like Mutation Operator for the k-Means Problem. CCIS 2020, 1275, 350–362. [Google Scholar] [CrossRef]
  111. Brimberg, J.; Mladenovic, N. A variable neighborhood algorithm for solving the continuous location-allocation problem. Stud. Locat. Anal. 1996, 10, 1–12. [Google Scholar]
  112. Miskovic, S.; Stanimirovich, Z.; Grujicic, I. An efficient variable neighborhood search for solving a robust dynamic facility location problem in emergency service network. Electron. Notes Discret. Math. 2015, 47, 261–268. [Google Scholar] [CrossRef]
  113. Crainic, T.G.; Gendreau, M.; Hansen, P.; Hoeb, N.; Mladenovic, N. Parallel variable neighbourhood search for the p-median. In Proceedings of the 4th Metaheuristics International conference MIC’2001, Porto, Portugal, 16–21 July 2001; pp. 595–599. [Google Scholar]
  114. Hansen, P.; Mladenovic, N. Variable neighborhood search for the p-median. Locat. Sci. 1997, 5, 207–226. [Google Scholar] [CrossRef]
  115. Wen, M.; Krapper, E.; Larsen, J.; Stidsen, T.K. A multilevel variable neighborhood search heuristic for a practical vehicle routing and driver scheduling problem. Networks 2011, 58, 311–323. [Google Scholar] [CrossRef] [Green Version]
  116. Baldassi, C. Recombinator-k-Means: Enhancing k-Means++ by Seeding from Pools of Previous Runs. Available online: https://arxiv.org/abs/1905.00531v1 (accessed on 18 September 2020).
  117. Duarte, A.; Mladenović, N.; Sánchez-Oro, J.; Todosijević, R. Variable Neighborhood Descent. In Handbook of Heuristics; Martí, R., Panos, P., Resende, M., Eds.; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]
  118. Dua, D.; Graff, C. UCI Machine Learning Repository 2019. Available online: http://archive.ics.uci.edu/ml (accessed on 30 September 2020).
  119. Molla, M.M.; Nag, P.; Thohura, S.; Khan, A. A Graphics Process Unit-Based Multiple-Relaxation-Time Lattice Boltzmann Simulation of Non-Newtonian Fluid Flows in a Backward Facing Step. Computation 2020, 8, 83. [Google Scholar] [CrossRef]
  120. Kazakovtsev, L.A.; Rozhnov, I.P.; Popov, E.A.; Karaseva, M.V.; Stupina, A.A. Parallel implementation of the greedy heuristic clustering algorithms. IOP Conf. Ser. Mater. Sci. Eng. 2019, 537, 022052. [Google Scholar] [CrossRef]
  121. Zhang, T.; Ramakrishnan, R.; Livny, M. BIRCH: An Efficient Data Clustering Method for Very Large Databases. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of data (SIGMOD’96), Montreal, QC, Canada, 4–6 June 1996; ACM: New York, NY, USA, 1996; pp. 103–114. [Google Scholar] [CrossRef]
  122. Smucker, M.D.; Allan, J.; Carterette, B.A. Comparison of Statistical Significance Tests for Information Retrieval. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management (CIKM ‘07), Lisbon, Portugal, 6–10 November 2007; ACM: New York, NY, USA, 2007; pp. 623–632. [Google Scholar]
  123. Park, H.M. Comparing Group Means: The t-Test and One-way ANOVA Using STATA, SAS, and SPSS; Indiana University: Bloomington, Indiana, 2009. [Google Scholar]
  124. Mann, H.B.; Whitney, D.R. On a Test of Whether one of Two Random Variables is Stochastically Larger than the other. Ann. Math. Stat. 1947, 18, 50–60. [Google Scholar] [CrossRef]
  125. Fay, M.P.; Proschan, M.A. Wilcoxon-Mann-Whitney or t-Test? On Assumptions for Hypothesis Tests and Multiple Interpretations of Decision Rules. Stat. Surv. 2010, 4, 1–39. [Google Scholar] [CrossRef]
  126. Burke, E.; Gendreau, M.; Hyde, M.; Kendall, G.; Ochoa, G.; Ozkan, E.; Qu, R. Hyper-heuristics: A survey of the state of the art. J. Oper. Res. Soc. 2013, 64, 1695–1724. [Google Scholar] [CrossRef] [Green Version]
  127. Stanovov, V.; Semenkin, E.; Semenkina, O. Self-configuring hybrid evolutionary algorithm for fuzzy imbalanced classification with adaptive instance selection. J. Artif. Intell. Soft Comput. Res. 2016, 6, 173–188. [Google Scholar] [CrossRef] [Green Version]
  128. Semenkina, M.; Semenkin, E. Hybrid Self-configuring Evolutionary Algorithm for Automated Design of Fuzzy Classifier. LNCS 2014, 8794, 310–317. [Google Scholar] [CrossRef]
Figure 1. Search in SWAPr neighborhoods. Dependence of the result on r: (a) BIRCH3 data set, 100 clusters, 105 data vectors, time limitation 10 s; (bd) Mopsi-Joensuu data set, 30, 100 and 300 clusters, 6014 data vectors, time limitation 5 s.
Figure 1. Search in SWAPr neighborhoods. Dependence of the result on r: (a) BIRCH3 data set, 100 clusters, 105 data vectors, time limitation 10 s; (bd) Mopsi-Joensuu data set, 30, 100 and 300 clusters, 6014 data vectors, time limitation 5 s.
Computation 08 00090 g001
Figure 2. Search in GREEDYr neighborhoods. Dependence of the result on r: (a) BIRCH3 data set, 100 clusters, 105 data vectors, time limitation 10 s; (bd) Mopsi-Joensuu data set, 30, 100 and 300 clusters, 6014 data vectors, time limitation 5 s; (ef) Mopsi-Finland data set, 100 and 300 clusters, 13,467 data vectors, time limitation 5 s.
Figure 2. Search in GREEDYr neighborhoods. Dependence of the result on r: (a) BIRCH3 data set, 100 clusters, 105 data vectors, time limitation 10 s; (bd) Mopsi-Joensuu data set, 30, 100 and 300 clusters, 6014 data vectors, time limitation 5 s; (ef) Mopsi-Finland data set, 100 and 300 clusters, 13,467 data vectors, time limitation 5 s.
Computation 08 00090 g002aComputation 08 00090 g002b
Figure 3. Search in GREEDYr neighborhoods. Dependence of the result on r: Individual Household Electric Power Consumption (IHEPC) data set, 50 clusters, 2,075,259 data vectors, time limitation 5 min.
Figure 3. Search in GREEDYr neighborhoods. Dependence of the result on r: Individual Household Electric Power Consumption (IHEPC) data set, 50 clusters, 2,075,259 data vectors, time limitation 5 min.
Computation 08 00090 g003
Figure 4. Mopsi-Joensuu data set visualization.
Figure 4. Mopsi-Joensuu data set visualization.
Computation 08 00090 g004
Figure 5. Frequency diagram of the results (our new algorithm vs. the best of other tested algorithms, GH-VNS3), Mopsi-Finland data set, 300 clusters, 13,467 data vectors, time limitation 5 s, 30 runs of each algorithm.
Figure 5. Frequency diagram of the results (our new algorithm vs. the best of other tested algorithms, GH-VNS3), Mopsi-Finland data set, 300 clusters, 13,467 data vectors, time limitation 5 s, 30 runs of each algorithm.
Computation 08 00090 g005
Figure 6. Comparative analysis of the convergence speed. Dependence of the median result on computation time for: (a) Individual Household Electric Power Consumption (IHEPC) data set, 50 clusters, 2,075,259 data vectors, time limitation 5 min; (b) Mopsi-Joensuu data set, 300 clusters, 6014 data vectors, time limitation 5 s; (c) Mopsi-Finland data set, 300clusters, 13,467 data vectors, time limitation 5 s.
Figure 6. Comparative analysis of the convergence speed. Dependence of the median result on computation time for: (a) Individual Household Electric Power Consumption (IHEPC) data set, 50 clusters, 2,075,259 data vectors, time limitation 5 min; (b) Mopsi-Joensuu data set, 300 clusters, 6014 data vectors, time limitation 5 s; (c) Mopsi-Finland data set, 300clusters, 13,467 data vectors, time limitation 5 s.
Computation 08 00090 g006
Table 1. Comparative results for all data sets (best of known algorithms vs. new algorithm).
Table 1. Comparative results for all data sets (best of known algorithms vs. new algorithm).
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runsp-Values and Statistical Significance of Difference in Results
Min (Record)Max (Worst)AverageMedianStd.dev
BIRCH3 data set. 105 data vectors in ℝ2, k = 300 clusters, time limitation 10 s
GREEDY2001.30773 × 10131.31172 × 10131.30916 × 10131.30912 × 10131.08001 × 1010pt = 0.4098↔
AdaptiveGreedy1.30807 × 10131.31113 × 10131.30922 × 10131.30925 × 10130.87731 × 1010pU = 0.2337⇔
BIRCH3 data set. 105 data vectors in ℝ2, k = 100 clusters, time limitation 10 s
GREEDY53.71485 × 10133.72087 × 10133.71644 × 10133.71518 × 10132.22600 × 1010pt = 0.0701↔
AdaptiveGreedy3.71484 × 10133.72011 × 10133.71726 × 10133.71749 × 10132.02784 × 1010pU = 0.1357⇔
Mopsi-Joensuu data set. 6014 data vectors in ℝ2, k = 300 clusters, time limitation 5 s
GH-VNS30.43210.68380.60240.61390.0836pU = 0.00005⇑
GREEDY2000.45551.01540.67460.58820.2163pt < 0.00001↑
AdaptiveGreedy0.31280.63520.46720.46040.1026
Mopsi-Joensuu data set. 6014 data vectors in ℝ2, k = 100 clusters, time limitation 5 s
GREEDY1001.80212.29422.01581.98490.1860pt = 0.0910↔
GH-VNS31.76432.73572.05131.98220.2699pU = 0.0042⇑
AdaptiveGreedy1.77592.32651.95781.92290.1523
Mopsi-Joensuu data set. 6014 data vectors in ℝ2, k = 30 clusters, time limitation 5 s
GH-VNS118.314718.325518.323818.32530.0039pt = 0.4118↔
AdaptiveGreedy18.314618.325818.324018.32530.0037pU = 0.2843⇔
Mopsi- Finland data set.13,467 data vectors in ℝ2, k = 300 clusters, time limitation 5 s
GH-VNS35.33373 × 1087.29800 × 1085.74914 × 1085.48427 × 1085.05346 × 107pt = 0.1392↔
AdaptiveGreedy5.27254 × 1087.09410 × 1085.60867 × 1085.38952 × 1084.89257 × 107pU = 0.0049⇑
Mopsi-Finland data set. 13,467 data vectors in ℝ2, k = 30 clusters, time limitation 5 s
GH-VNS33.42528 × 10103.47955 × 10103.43826 × 10103.43474 × 10101.02356 × 108pt = 0.0520↔
AdaptiveGreedy3.42528 × 10103.47353 × 10103.43385 × 10103.43473 × 10101.03984 × 108pU = 0.0001⇑
S1 data set. 5000 data vectors in ℝ2, k = 15 clusters, time limitation 1 second
GH-VNS28.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10120.0000pt = 0.5↔
AdaptiveGreedy8.91703 × 10128.91703 × 10128.91703 × 10128.91703 × 10120.0000pU = 0.5⇔
S1 data set. 5000 data vectors in ℝ2, k = 50 clusters, time limitation 1 second
GH-VNS13.74310 × 10123.76674 × 10123.74911 × 10123.74580 × 10126.99859 × 109pt = 0.3571↔
AdaptiveGreedy3.74340 × 10123.76313 × 10123.74851 × 10123.75037 × 10125.56298 × 109pU = 0.28434⇔
IHEPC data set. 2,075,259 data vectors in ℝ7, k = 50 clusters, time limitation 5 min
GREEDY105154.20175176.45025162.04605160.40147.2029pt = 0.008↑
AdaptiveGreedy5153.56405163.93165157.08225155.51983.6034pU = 0.001⇑
Note: “↑”, “⇑”: the advantage of the new algorithms over known algorithms is statistically significant (“↑” for t-test and “⇑” for Mann–Whitney U test), “↓”, “⇓”: the disadvantage of the new algorithm over known algorithms is statistically significant; “↔”, “⇔”: the advantage or disadvantage is statistically insignificant. Significance level is 0.01.
Table 2. Additional benchmarking on NVIDIA GeForce 9600GT GPU. Comparative results for Mopsi- Finland data set.13,467 data vectors in 2 , time limitation 30 s.
Table 2. Additional benchmarking on NVIDIA GeForce 9600GT GPU. Comparative results for Mopsi- Finland data set.13,467 data vectors in 2 , time limitation 30 s.
Algorithm or NeighborhoodAchieved SSE Summarized After 30 Runs
Min (Record)Max (Worst)AverageMedianStd.dev
k = 300
GH-VNS35.33373 × 1087.29800 × 1085.85377 × 1085.52320 × 1085.59987 × 107
AdaptiveGreedy5.27254 × 1087.09410 × 1085.59033 × 1085.38888 × 1084.60585 × 107
k = 30
GH-VNS23.42528 × 10103.48723 × 10103.43916 × 10103.43474 × 10101.46818 × 108
GH-VNS33.42528 × 10103.46408 × 10103.43731 × 10103.43474 × 10107.81989 × 107
AdaptiveGreedy3.42528 × 10103.46274 × 10103.43337 × 10103.43473 × 10108.13882 × 107
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kazakovtsev, L.; Rozhnov, I.; Popov, A.; Tovbis, E. Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering. Computation 2020, 8, 90. https://doi.org/10.3390/computation8040090

AMA Style

Kazakovtsev L, Rozhnov I, Popov A, Tovbis E. Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering. Computation. 2020; 8(4):90. https://doi.org/10.3390/computation8040090

Chicago/Turabian Style

Kazakovtsev, Lev, Ivan Rozhnov, Aleksey Popov, and Elena Tovbis. 2020. "Self-Adjusting Variable Neighborhood Search Algorithm for Near-Optimal k-Means Clustering" Computation 8, no. 4: 90. https://doi.org/10.3390/computation8040090

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop