A Fast Algorithm to Initialize Cluster Centroids in Fuzzy Clustering Applications

: The goal of partitioning clustering analysis is to divide a dataset into a predetermined number of homogeneous clusters. The quality of ﬁnal clusters from a prototype-based partitioning algorithm is highly a ﬀ ected by the initially chosen centroids. In this paper, we propose the InoFrep, a novel data-dependent initialization algorithm for improving computational e ﬃ ciency and robustness in prototype-based hard and fuzzy clustering. The InoFrep is a single-pass algorithm using the frequency polygon data of the feature with the highest peaks count in a dataset. By using the Fuzzy C-means (FCM) clustering algorithm, we empirically compare the performance of the InoFrep on one synthetic and six real datasets to those of two common initialization methods: Random sampling of data points and K-means ++ . Our results show that the InoFrep algorithm signiﬁcantly reduces the number of iterations and the computing time required by the FCM algorithm. Additionally, it can be applied to multidimensional large datasets because of its shorter initialization time and independence from dimensionality due to working with only one feature with the highest number of peaks.


Introduction
Cluster analysis is one of the main tools of exploratory data analysis in many fields of research and industrial applications requiring image segmentation, computer vision and pattern analysis. The partitioning-based algorithms (a.k.a non-hierarchical or flat algorithms) are probably the most popular among the existing clustering algorithms. A major part of the partitioning algorithms are based on iterative optimization techniques [1]. An iterative optimization task is started with an initial partition of data and then the partitions are iteratively updated by applying a local search algorithm until a convergence criterion is satisfied. Iterations are made by relocating data points between the clusters until a locally optimal partition is found. Since the number of data points in any dataset is always finite, the number of distinct partitions is also finite. The local minima problem could be defeated by using a globally optimal partitioning method [2]. But such exhaustive search methods are ineffective in practice because they require too much of computation time for the globally optimal result. Therefore, a more practical approach is to apply the iterative algorithms which can be divided into two categories such as prototype-based and distribution-based algorithms. The prototype-based algorithms assume that the characteristics of the instances in a cluster can be represented by using a cluster prototype which is a point in the data space. Such algorithms use c prototypes and assign the n instances into the clusters according to their proximity to the prototypes. The objective is to find the clusters that are compact and well-separated from each other.
Although the prototypes of clusters can be centroids or medoids, the former is generally used in most of the applications. The validity of clustering results is closely related to the accurate choice of initial cluster centroids even though an algorithm itself overcomes the coincident clusters problem and is relatively faster than the others. A partitioning algorithm cannot guarantee the convergence to an optimum result because the performance of partitioning depends upon the chosen initial cluster centroids. Thus, the initialization of a prototype-based clustering algorithm is an important step since different choices of the initial cluster centroids can potentially lead to different local optima or different partitions [3]. To get better results, the clustering algorithm, that is, K-means or Fuzzy C-Means (FCM), is run for several times and in each of these runs the algorithm is started with a different set of initial cluster centroids [4]. But this is a highly time-consuming approach especially for high dimensional data. For this reason, the initialization of the partition-based clustering algorithms is a matter of interest [2]. Consequently, faster algorithms estimating the initial cluster centroids are needed in partitioning cluster analyses. The InoFrep (Initialization on Frequency Polygons) algorithm proposed in this paper is a simple data-dependent initialization algorithm which is based on the frequency polygons of features in datasets. The algorithm assumes that the peaks (or the modes) in frequency polygons are the estimates of central tendency locations or the centers of different dense regions, namely the clusters in an examined dataset. Thus, the peaks in frequency polygons can be used for determining the initial cluster centroids in prototype-based cluster analyses.

Fuzzy C-Means Clustering and Fuzzy Validity Indices
In a comprehensive survey [5], it is concluded that the clustering algorithm EM and FCM show excellent performance with respect to the quality of the clustering outputs but suffer from high computational time requirements. Hence, the authors of Reference [4] addressed possible solutions relying on programming which may allow such algorithms to be executed more efficiently for big data. In this study, because of its high performance and popularity in the literature we use the original Fuzzy C-means Clustering (FCM) algorithm [6] as the representative of prototype-based clustering algorithms. As a soft clustering algorithm, the FCM differs from the hard K-means algorithm with the use of weighted squared errors instead of using squared errors only. Let X = {x 1 , x 2 , . . . , x n } be a dataset to be analyzed and V = {v 1 , v 2 , . . . , v c } be a set of the centroids of clusters in the dataset X in p dimensional space (R p ), where n is the number of instances, p is the number of features and c is the number of partitions or clusters. For the dataset X, the FCM minimizes the objective function in Equation (1).
In Equation (1), U of n × c dimension is the membership degrees matrix for a fuzzy partition of X.
The element u ik is the membership value of kth instance to the ith cluster. Thus, the ith column of U matrix consists of the membership values of n instances to the ith cluster. V is a cluster prototypes matrix defined in Equation (3): In Equation (1), d 2 ikA is the distance between k th data point and the centroid of the ith cluster. It is computed using a squared inner-product distance norm as in Equation (4): A is a positive and symmetric norm matrix in Equation (4). The inner product with A is a measure of distances between data points and cluster prototypes. When A is equal to I, d 2 ikA is obtained in squared Euclidean norm. In Equation (1), m is a fuzzifier parameter (or weighting exponent) whose value is chosen as a real number greater than 1 (m ∈ [1, ∞)). While m approaches to 1, clustering tends to become crisp but when it approaches to the infinity clustering becomes more fuzzified. The value of m is usually set to 2 in most of the applications. The objective function J FCM is minimized with the constraints given in Equations (5)- (7): The FCM stops when the number of iterations has reached a predefined maximum number of iterations or when the difference between the sums of membership values in U, obtains two consecutive iterations that are less than a predefined convergence value (ε). The steps involved in the FCM are:

1.
Initialize the membership matrix U and the prototype matrix V.
Update the membership values with: 2.
If U (t) − U (t−1) < ε then stop else go to the step 2, where t is the iteration number.
For evaluating the effect of initialization algorithms on the clustering results of the FCM, we use the fuzzy clustering validation indices listed in Table 1. The indices of Partition Entropy and Modified Partition Coefficient use partition matrix U only, whereas the indices of Xie-Beni, Kwon and PBMF use U, V and X as shown in the formulas in Table 1. Therefore, even if the latter ones require more execution time, it is expected that they may give more accurate validation of partitioning by using the dataset itself and centroids matrix in addition to the fuzzy membership matrix.

Index Index Formula
Partition Entropy

Related Works on Initialization of Cluster Centroids
To generate the initial cluster centroids matrix V, in the first step of prototype-based algorithms, the principal rule is to find the data points that are close enough to the final centers of the clusters and they should be reasonably far from each other for different clusters. In this case, convergence will be quicker to return a good clustering result. For this goal, we could iterate over all the points to determine where the distances are the maximum between them. However, such an iterative approach can be seen as ineffective and already done by the partitioning algorithms themselves. Hence, we need computationally effective methods and many of them are already present in the literature. In a comprehensive review [1], the initialization methods are broadly categorized into three groups as the data-independent, the simple and the sophisticated data-dependent methods. The data-independent methods completely ignore the data points. On the other hand, the simple data-dependent methods use the data points in initialization by random sampling them whereas the sophisticated data-dependent methods use data points in more complicated fashions. Despite their simplicity, the data-independent methods have many disadvantages and hence, not preferred in the clustering applications.
The initialization by random sampling process on datasets (so-called Irand in this paper) is the simplest data-dependent method in which the random samples are drawn from the dataset without replacement for using the prototypes of each cluster. The Irand has been applied in many clustering implementations due to its simplicity and computational efficiency. The grid block method [12] divides the data space into the blocks and searches for the dense regions. A grid block is considered as the indicator of a cluster center if the number of data points in it is greater than a given threshold value. Although this method works well for two-dimensional datasets, it has some disadvantages for multidimensional data and also presents difficulties in selection of the thresholds.
Although there are several sophisticated data-dependent approaches, for example, Particle Swarm Optimization [13], the most interesting representatives of these methods are Mountain clustering [14] and, Subtractive clustering [15]. Mountain clustering [14] is a method for approximate estimation of cluster centers on the basis of density measures. Despite the relative simplicity and effectiveness of this method, its computational cost increases exponentially when the dimensions of the patterns grow since the method must evaluate the mountain function over all grid points [3]. In Subtractive clustering as an alternative one-pass algorithm [15], instead of grids points, the data points are processed as the candidates of cluster centroids in the dataset. By using this method, the computational cost is simply proportional to the number of data points and free from the dimension problem that arises with the Mountain method. Applying these methods is difficult because they require the input parameters which should be configured by the users [3]. The K-means++ [16] is another approximation algorithm overcoming the poor clustering problem, which sometimes happens with the classical K-means algorithm. K-means++ (called 'kmpp' in this paper) initializes the cluster centers by selecting the data points that are farther away from each other in a probabilistic manner. The kmpp is a recommended method in the clustering applications because of its several advantages versus the methods above discussed.
The first two methods above discussed are deterministic, giving the same cluster centroids for every run on the same dataset while the kmpp is non-deterministic. In some of the studies the deterministic methods are recommended because of lower computational complexity but some others suggest to use the non-deterministic ones because of their empirically proven effectiveness on the real datasets [17]. For this reason, we have selected the Irand as the representative of simple data-dependent methods and the kmpp as the representative of sophisticated data-dependent methods. As shown in the following sections, the InoFrep is a data-dependent algorithm that uses the peaks on the frequency polygon of the feature with the highest number of peaks. Using the values of these peaks as the initial values of the cluster centers, the algorithm InoFrep enables the clustering algorithms approach to the final clustering results faster. This significantly reduces the number of iterations and the computation time required by the clustering algorithms. Since the cluster initial values are determined with only a single-pass of the algorithm, it also provides the advantage of using the same initial values in the repetitive runs of the clustering algorithms. In the following sections, we introduce the InoFrep and compare its effectiveness with those of the Irand and the kmpp.

Proposed Algorithm: Initialization on Frequency Polygons
To explain the logic behind the proposed algorithm, a small numerical data of 10 observations of the two features (p1 and p2) is given as following: 8,7,4,8,4,3,8,9, 4} p2 = {5, 6, 5, 4, 5, 6, 5, 4, 5, 5} As it is seen from the scatter plot p1 vs p2 in Figure 1a, two well-separated clusters do exist in this two dimensional simple dataset. As demonstrated in the figure, the center points of these two clusters are v 1 = (4, 5) and v 2 = (8, 5). If the cluster centroids are initialized with the values close to these central points (v 1 and v 2 ), clustering algorithms will approach to the actual cluster centers with a few iterations. Thus, starting the clustering algorithms with initial values which are close to the real cluster centers can remarkably reduce the computing time required in clustering analysis. In the descriptive statistics, histograms and frequency polygons are used as visual tools for understanding and comparing the shapes of distributions of features in a dataset [18]. In a frequency polygon, the x-axis represents the values of c classes of features and the y-axis indicates the frequency of each class. Therefore, frequency polygons also serve structural information about the data. The values of peaks of a feature are the modes of data representing the most repeated instances [18], and, thus, they can be used as the prototypes of cluster centers in datasets. The histograms and frequency polygons of the features p1 and p2 in our simple example data are shown in Figure 1b,c, respectively. The values (pv 1 and pv 2 ) and frequencies (pc 1 and pc 2 ) of these peaks are given below: As it is seen from the scatter plot p1 vs p2 in Figure 1a, two well-separated clusters do exist in two dimensional simple dataset. As demonstrated in the figure, the center points of these two ters are v1 = (4,5) and v2 = (8,5). If the cluster centroids are initialized with the values close to these ral points (v1 and v2), clustering algorithms will approach to the actual cluster centers with a few tions. Thus, starting the clustering algorithms with initial values which are close to the real ter centers can remarkably reduce the computing time required in clustering analysis. In the riptive statistics, histograms and frequency polygons are used as visual tools for understanding comparing the shapes of distributions of features in a dataset [18]. In a frequency polygon, the xrepresents the values of c classes of features and the y-axis indicates the frequency of each class. efore, frequency polygons also serve structural information about the data. The values of peaks feature are the modes of data representing the most repeated instances [18], and, thus, they can sed as the prototypes of cluster centers in datasets. The histograms and frequency polygons of eatures p1 and p2 in our simple example data are shown in Figure 1b,c, respectively. The values and pv2) and frequencies (pc1 and pc2) of these peaks are given below: As shown in the frequency polygon mid values above, there are five peaks for the feature p1 and e are two peaks for the feature p2. Since the peaks indicate the presence of subgroups or clusters e studied data, we can assume that there are 5 clusters according to the first feature and 2 clusters rding to the second feature. The value of the peak with the highest frequency can be used as the er coordinates of the first cluster, which in our example is 3.75. Then the value of the peak with second high frequency will be used as the initial value of the first cluster, which is 7.75 in our ple. These initial values are very close to the values of actual center of the first cluster for p1. n the same operations are done for p2, the peak values 4.75 and 5.75 are assigned as the initial es of the first and second clusters. If the number of peaks determined for a feature is less than number of clusters (parameter c) given as the input argument in the cluster analysis, the other ters can be initialized with random sampling. For finding the peaks and obtaining the values of peaks to be used as the initial centroid have developed the findpolypeaks algorithm (Algorithm 1). The input arguments of this algo are the frequencies and middle values of the classes of frequency polygon of the analyzed featu and xm respectively), a threshold counts value (tc) for filtering purposes. The output returned algorithm is a peaks matrix PM. At the beginning of the findpolypeaks, the frequencies and m As shown in the frequency polygon mid values above, there are five peaks for the feature p1 and there are two peaks for the feature p2. Since the peaks indicate the presence of subgroups or clusters in the studied data, we can assume that there are 5 clusters according to the first feature and 2 clusters according to the second feature. The value of the peak with the highest frequency can be used as the center coordinates of the first cluster, which in our example is 3.75. Then the value of the peak with the second high frequency will be used as the initial value of the first cluster, which is 7.75 in our example. These initial values are very close to the values of actual center of the first cluster for p1. When the same operations are done for p2, the peak values 4.75 and 5.75 are assigned as the initial values of the first and second clusters. If the number of peaks determined for a feature is less than the number of clusters (parameter c) given as the input argument in the cluster analysis, the other clusters can be initialized with random sampling.
For finding the peaks and obtaining the values of peaks to be used as the initial centroids, we have developed the findpolypeaks algorithm (Algorithm 1). The input arguments of this algorithm are the frequencies and middle values of the classes of frequency polygon of the analyzed feature (xc and xm respectively), a threshold counts value (tc) for filtering purposes. The output returned by the algorithm is a peaks matrix PM. At the beginning of the findpolypeaks, the frequencies and middle values of the frequency polygon are filtered and the frequencies below a threshold value, tc are removed from xc. The default value of tc is 1 that means that all 0's and 1's are removed from xc because they are not needed or might be noises (Line 1 in Algorithm 1). In this way, the valleys and possible noises in the frequency vector of frequency polygons are eliminated from xc and xm for making the process faster and more robust. Then, the number of classes in xc is computed (nc) and an index for the peaks (pidx) is started at 1.
If xc contains only one element (one frequency value), it is returned as the peak of the analyzed feature (Line 24 in Algorithm 1). Otherwise, the frequencies in xc are examined to find the peaks of analyzed feature (Lines 6-22 in Algorithm 1). If the first frequency value in xc is greater than the second value, it is assigned as the first peak value; and pidx, which the index for peaks is increased by 1 (Lines 7-10 in Algorithm 1). Then a loop is performed on the remaining frequency values for finding the other peaks (Lines 11-19 in Algorithm 1). If the ith frequency value is greater than previous (i − 1th) and next (i + 1th) frequency values in xc, it is flagged as a peak and the pidx is increased one (14-16 in Algorithm1). One last control is performed whether a last peak does exist or not (Lines 20-22 in Algorithm 1). Finally, the peaks matrix PM consists of np rows and 2 columns is generated and returned by the findpolypeaks. The values and the frequencies of the peaks found by the algorithm are stored in the first and second columns of PM respectively (Line 28 in Algorithm 1).
The InoFrep algorithm (Algorithm 2) uses three input arguments: X nxp , dataset as a matrix (n: number of instances, p: number of features), c, number of clusters and nc, number of classes for generating frequency polygons. Here, nc is determined heuristically. If a number greater than the actual number of clusters in the dataset has been chosen for the nc, the algorithm will remove the gaps between the bins thus it will not become a major problem for finding the peaks. For instance, in our experiments with the synthetic dataset in the next section where nc is chosen as 20 while the actual number of clusters is 4, the algorithm does not struggle to determine the peaks counts. The output of the algorithm is the initial centroids matrix of c rows and p columns. In the initialization phase of the algorithm, all elements of V matrix are set to 0 and an atomic vector peakcounts is generated to store the frequencies of peaks in the analyzed frequency polygon (Lines 1-2 in Algorithm 2). The frequency polygon of feature j is generated and the mid values and the frequencies of the classes are stored in two atomic vectors (jmids and jcounts in Algorithm 2 respectively; see Lines 4-5 in Table 2 for the examined dataset). The algorithm findpolypeaks with these input arguments and the number of peak frequencies for the feature j is stored as the j th value of the vector peakcounts (Lines 7-8 in Algorithm 2). Then, the feature index with the highest peak counts is determined as maxj and its frequency polygon is generated (Lines 10-11 in Algorithm 2). Next, findpolypeaks is called with the middle values and the frequencies of the classes of frequency polygon for the feature maxj, the feature with maximum peak counts. The returned peaks matrix PM is ordered on the peak frequencies in descending order and PMS matrix is obtained (Lines 12-18 in Algorithm 2). The peak values in the first column of PMS are used to find the closest points to them in the dataset. The found data point of the feature maxj is assigned as the centroid of the i th cluster (Line 21 in Algorithm 2). If the number of peaks is less than the number of clusters to be used by the clustering algorithm, the centroids of the remaining clusters (c-np clusters) are generated with randomly sampled data points of the feature maxj (Line 25 in Algorithm 2). The randomly sampled points are checked for duplicates to prevent coincided cluster centroids (Lines 26-32 in Algorithm 2). The above-described processes are repeated until the number of clusters and finally the initial centroids matrix V is returned to the clustering algorithm.

Experiment on a Synthetic Dataset
In this study, the findpolypeaks and the InoFrep algorithms have been implemented in R [19] and tested on a computer with i7-6700HQ CPU (2.60 GHz) and 16GB RAM. For comparison of the InoFrep to the others, we have also coded the R functions for the kmpp and the Irand algorithms (See Supplementary Materials). To evaluate the performance of the compared algorithms, we have generated a synthetic dataset (3P_4C) by using the rnorm function of base stats library of R. The dataset consisted of three mixture Gaussian features with the descriptive statistics shown in Table 2. The first feature (p1) is unimodal, the second feature (p2) is four modal and third feature (p3) is three modal as seen in Figure 2a. Although the number of instances in the created example synthetic dataset is arbitrarily chosen as 400 to easily monitor the distribution and scattering of the points in the graphics, working with a smaller and larger number of instances does not affect the relative success of the proposed algorithm because it only uses the modes to initialize the cluster centers.
Information 2020, 11, x FOR PEER REVIEW 9 of 15

Experiment on a Synthetic Dataset
In this study, the findpolypeaks and the InoFrep algorithms have been implemented in R [19] and tested on a computer with i7-6700HQ CPU (2.60 GHz) and 16GB RAM. For comparison of the InoFrep to the others, we have also coded the R functions for the kmpp and the Irand algorithms (See Supplementary Materials). To evaluate the performance of the compared algorithms, we have generated a synthetic dataset (3P_4C) by using the rnorm function of base stats library of R. The dataset consisted of three mixture Gaussian features with the descriptive statistics shown in Table 2. The first feature (p1) is unimodal, the second feature (p2) is four modal and third feature (p3) is three modal as seen in Figure 2(a). Although the number of instances in the created example synthetic dataset is arbitrarily chosen as 400 to easily monitor the distribution and scattering of the points in the graphics, working with a smaller and larger number of instances does not affect the relative success of the proposed algorithm because it only uses the modes to initialize the cluster centers.  In our experiment, we run the FCM for six levels of the number of clusters (c = 2, . . . ,7) with each of the three initialization algorithms (InoFrep, kmpp and Irand). In each level of the number of clusters, the FCM is started for ten times because the Irand and the kmpp algorithms determine different centroids in different runs due to the non-deterministic nature of these algorithms. In order to prevent the possible biases due to different membership matrix U initialization, we used the same U matrix for each level of the number of clusters in repeated runs of the FCM.
The results obtained from the FCM runs on the 3P_4C dataset are shown in Table 3. In this table, imin, iavg, imax and isum, respectively, stand for the minimum number of iterations, the average number of iterations, the maximum number of iterations and the total number of iterations in ten runs of the FCM. As another performance criterion, ctime in Table 3 shows the total computing time (milliseconds) for ten runs of the FCM. In the last row of Table 3, itime stands for the average computing time of the initialization algorithms for evaluating their initialization performances. As seen in Table 3, the InoFrep requires a smaller number of iterations and computing time when compared to the kmpp and the Irand (the best results are shown in bold in the table). The kmpp is in the middle and the Irand is the worst (Chi-Sq. = 26.503, df = 10, p = 0.00312). As clearly seen from Figure 3a,b, the performances of all of the algorithms converges to each other when c is 7. If the number of clusters processed by the FCM is greater than the maximum peak counts found by the InoFrep, the centroids for the last c-np clusters are generated with random sampling technique (see Line 25 in Algorithm 2). In this case, although the performance of the InoFrep becomes similar to the performances of the kmpp and the Irand although this is a rare occasion for most of the data, however, running the FCM for larger c values will not be reasonable. In parallel to the number of iterations, the computing times required by the FCM are also significantly different between the compared initialization algorithms (Chi-sq = 279.58, df = 10, p < 2.2 × 10 −16 ). According to the results in Table 3, the InoFrep requires less computing time when compared to those required by the kmpp and the Irand. The InoFrep is especially better than the kmpp and the Irand when the number of clusters approached to the number of actual clusters in the analyzed datasets. Moreover, another superiority of the InoFrep is due to its stability between different runs of the FCM. While the kmpp and the Irand do not ensure the same initialization values from one run to another, the InoFrep presents the same values between runs of clustering algorithms below the number of peaks (np). Because, the InoFrep is considered as a semi-deterministic algorithm and it does not need the repeated runs for testing of a better initialization. In other words, just one run of the InoFrep guarantees the same initialization results if np for the selected feature is less than the number of clusters (c) passed to the FCM. Consequently, the number of iterations required by the FCM with the initial centroids generated by the InoFrep are significantly less than those of the compared algorithms. Thus it indicates that the InoFrep has higher computational efficiency. At the same time, since the algorithm uses the modes of features it takes the present structure of the dataset into account and hence reinforces the noise robustness. In parallel to the number of iterations, the computing times required by the FCM are also significantly different between the compared initialization algorithms (Chi-sq = 279.58, df = 10, p < 2.210 -16 ). According to the results in Table 3, the InoFrep requires less computing time when compared to those required by the kmpp and the Irand. The InoFrep is especially better than the kmpp and the Irand when the number of clusters approached to the number of actual clusters in the analyzed datasets. Moreover, another superiority of the InoFrep is due to its stability between different runs of the FCM. While the kmpp and the Irand do not ensure the same initialization values from one run to another, the InoFrep presents the same values between runs of clustering algorithms below the number of peaks (np). Because, the InoFrep is considered as a semi-deterministic algorithm and it does not need the repeated runs for testing of a better initialization. In other words, just one run of the InoFrep guarantees the same initialization results if np for the selected feature is less than the number of clusters (c) passed to the FCM. Consequently, the number of iterations required by the FCM with the initial centroids generated by the InoFrep are significantly less than those of the compared algorithms. Thus it indicates that the InoFrep has higher computational efficiency. At the same time, since the algorithm uses the modes of features it takes the present structure of the dataset into account and hence reinforces the noise robustness.
In our study, the fuzzy index values computed from membership matrices returned by all the FCM runs are the same. As seen in Table 4, the indices of XB and Kwon suggests three clusters while PBMF, MPC and PE suggests four clusters. As visible in Figure 2b above, three or four natural groupings might be obtained for 3P_4C synthetic dataset. Therefore, although both of these results are acceptable, we could conclude that PMBF, MPC and PE suggests an accurate number of clusters for the examined dataset.   In our study, the fuzzy index values computed from membership matrices returned by all the FCM runs are the same. As seen in Table 4, the indices of XB and Kwon suggests three clusters while PBMF, MPC and PE suggests four clusters. As visible in Figure 2b above, three or four natural groupings might be obtained for 3P_4C synthetic dataset. Therefore, although both of these results are acceptable, we could conclude that PMBF, MPC and PE suggests an accurate number of clusters for the examined dataset. In the literature, performance evaluation of the algorithms focuses mostly on the comparison of the number of iterations and computing time required by the clustering algorithms as done above. In this study, we have also investigated the performances in initialization step itself. As seen in Figure 4 and the last row of Table 3, the time required by three initialization algorithms (itime) differs significantly. The InoFrep required less initialization time at all levels of the number of clusters. The initialization time of the kmpp increases linearly and is longer than those of the Irand and the InoFrep. However, the initialization time required by the InoFrep and the Irand is more or less close to each other, although it is longer for the Irand for the clustering at c = 7. 4 and the last row of Table 3, the time required by three initialization algorithms (itime) differs significantly. The InoFrep required less initialization time at all levels of the number of clusters. The initialization time of the kmpp increases linearly and is longer than those of the Irand and the InoFrep. However, the initialization time required by the InoFrep and the Irand is more or less close to each other, although it is longer for the Irand for the clustering at c = 7.

Experimental Results on Real Datasets
In order to compare the performances of the tested initialization algorithms we used six real datasets from UCI Machine Learning Repository [20]. The definitions of these datasets are given in Table 5. In this table, p, c, ec and sp respectively stand for the number of features, the number of clusters, the estimated number of clusters by the fuzzy indices in Table 1 and the index of selected feature with large number of peaks in the related datasets. The results obtained with the InoFrep, the kmpp and the Irand on the real datasets are given in Tables 6-8 respectively. In regard of average number of iterations (iters) and computing time (ctime) required by the FCM, the InoFrep performs relatively better than the kmpp and the Irand for most of the real datasets. The InoFrep outperforms in the clustering sessions done with the cluster numbers which are equal and close to the actual cluster numbers in the examined real datasets. The Irand is also good in some clustering sessions for Iris, Forest and Wine datasets especially in the clustering sessions done with large number of clusters. The InoFrep algorithm uses the same technique with the Irand for the clustering done with larger clusters above the actual number of clusters in an examined

Experimental Results on Real Datasets
In order to compare the performances of the tested initialization algorithms we used six real datasets from UCI Machine Learning Repository [20]. The definitions of these datasets are given in Table 5. In this table, p, c, ec and sp respectively stand for the number of features, the number of clusters, the estimated number of clusters by the fuzzy indices in Table 1 and the index of selected feature with large number of peaks in the related datasets.