# Hybrid Fuzzy C-Means Clustering Algorithm Oriented to Big Data Realms

^{1}

^{2}

^{3}

^{4}

^{5}

^{*}

## Abstract

**:**

## 1. Introduction

^{2}t), where n is the number of objects, d is the number of dimensions, c is the number of clusters, and t is the number of iterations. In this paper, indistinct dimensions or attributes are named to represent the characteristics of the objects.

## 2. Related Work

## 3. Materials and Methods

#### 3.1. K-Means Algorithm

_{1}, …, x

_{n}} be the set of n objects to be partitioned by a criterion of similarity, where x

_{i}$\in $ R

^{d}for i = 1, …, n, and d ≥ 1 is the number of dimensions. Furthermore, let k ≥ 2 be an integer and K = {1, …, k}. For a k-partition P = {G(1), …, G(k)} of X, denote v

_{j}the centroid of cluster G(j), for j $\in $ K, and let V = {v

_{1}, …, v

_{k}} and W = {w

_{11}, …, w

_{ij}}.

_{ij}= 1 ⇔ the object x

_{i}belongs to cluster G(j), and D(x

_{i}, v

_{j}) denotes the Euclidean distance between x

_{i}and v

_{j}for i =1, …, n and j = 1, …, k. The pseudocode of the standard K-Means algorithm is shown in Algorithm 1 [24].

Algorithm 1: Standard K-Means | |

1 | Initialization: |

2 | X: = {x_{1}, …, x_{n}}; |

3 | V: = {v_{1}, …, v_{k}}; |

4 | Classification: |

5 | For x_{i} ϵ X and v_{k} ϵ V{ |

6 | Calculate the Euclidean distance from each x_{i} to the k centroids; |

7 | Assign the x_{i} object to the nearest v_{k} centroid;} |

8 | Calculate centroids: |

9 | Calculate the centroid v_{k}; |

10 | Convergence: |

11 | If V: = {v_{1}, …, v_{k}} does not change in two consecutive iterations: |

12 | Stop the algorithm; |

13 | Otherwise: |

14 | Go to Classification |

15 | End of algorithm |

#### 3.2. K++ Algorithm

Algorithm 2: K++ | |

1 | Initialization: |

2 | X: = {x_{1}, …, x_{n}}; |

3 | Assign the value for k; |

4 | V: = Ø; |

5 | Select the first randomly uniform k_{1} centroid V: = V U {v_{1}} ; |

6 | For i = 2 to k: |

7 | Select the i-th centroid v_{i} of X with probability D(x_{i}, v_{j})/∑_{x}_{ϵX} D(x_{i}, v_{j}); |

8 | V: = V U {v_{i}} ; |

9 | End of for |

10 | Return V |

11 | End of algorithm |

_{1}in a uniform random manner from dataset X. Subsequently, assign it to the set of centroids V. For the second centroid, and until completing the total of k, select the i-th centroid with a probability distribution. In this paper, the algorithm is referred to as K++ when it is only about initializing the centroids.

#### 3.3. O-K-Means Algorithm

Algorithm 3: O-K-Means | |

1 | Initialization: |

2 | X: = {x_{1}, …, x_{n}}; |

3 | V: = {v_{1}, …, v_{k}}; |

4 | εok: = Threshold value for determining O-K-Means convergence; |

5 | Classification: |

6 | For x_{i} ϵ X and v_{k} ϵ V{ |

7 | Calculate the Euclidean distance from each x_{i} to the k centroids; |

8 | Assign the x_{i} object to the nearest v_{k} centroid; |

9 | Compute γ}; |

10 | Calculate centroids: |

11 | Calculate the centroid v_{k}; |

12 | Convergence: |

13 | If (γ ≤ εok): |

14 | Stop the algorithm; |

15 | Otherwise: |

16 | Go to Classification |

17 | End of algorithm |

_{t}= 100(o

_{t}/n), where o

_{t}is the number of objects that change the cluster.

#### 3.4. FCM Algorithm

_{1}, …, x

_{n}} is the set of n objects to be partitioned by a similarity criterion, where x

_{i}∈ R

^{d}for i = 1,…, n, and c is the number of clusters where 2 ≤ c < n.

_{ij}is the membership matrix of each object i to each cluster j; V = {v

_{1}, …, v

_{c}} is the set of centroids, where v

_{j}is the centroid of cluster j; and m is the weighting exponent or fuzzy factor that indicates how much the clusters overlap, m > 1 y D(x

_{i}, v

_{j}), indicating the Euclidean distance between the object x

_{i}and the centroid v

_{j}for i = 1, …, n and j = 1, …, c.

_{m}, an estimated model of U and V is obtained as

_{i}and v

_{j}are vectors that belong to a space R

^{d}and are represented as

_{i}= (x

_{1}, x

_{2}, …, x

_{d}), 1 ≤ i ≤ n

_{j}= (v

_{1}, v

_{2}, …, v

_{d}), 1 ≤ j ≤ c

Algorithm 4: Standard FCM | |

1 | Initialization: |

2 | Assign the value for c y m; |

3 | Determine the value of the threshold ε for convergence; |

4 | t: = 0, TMAX: = 50; |

5 | X: = {x_{1}, …, x_{n}}; |

6 | U^{(t)}: = {µ_{11}, …, µ_{ij}}; is randomly generated |

7 | Calculate centroids: |

8 | Calculate the centroid v_{k}; |

9 | Classification: |

10 | Calculate and update the membership matrix U^{(t+1)}: = {µ_{ij}} |

11 | Convergence: |

12 | If max [abs(µ_{ij}^{(t)} − µ_{ij}^{(t+1)})] < ε or t ≤ TMAX: |

13 | Stop the algorithm; |

14 | Otherwise: |

15 | U^{(t)}: = U^{(t+1)} y t: = t + 1; |

16 | Go to Classification |

17 | End of algorithm |

## 4. Proposal for Improvement

#### 4.1. Transformation Functions

_{1}, …, x

_{n}} the set of n objects to partition, where x

_{i}ϵ R

^{d}for i = 1, …, n, and d ≥ 1 is the number of dimensions.

_{1}, …, v

_{c}} is the set of final centroids of a solution obtained with the O-K-Means algorithm, where v

_{j}is the centroid of the cluster j.

_{i}and v

_{j}are the vectors described in Equations (5) and (6).

_{ij}, where µ

_{ij}is the membership matrix of each object x

_{i}to each cluster j.

Algorithm 5: HOFCM | |

1 | Initialization: |

2 | X: = {x_{1}, …, x_{n}}; |

3 | V: = {v_{1}, …, v_{c}}; |

4 | εok: = Threshold value for determining O-K-Means convergence; |

5 | Assign the value for c; |

6 | i: = 1; |

7 | Repeat |

8 | Function K++ (X, c): |

9 | Return V’; |

10 | Function O-K-Means (X, V″, εok, c): |

11 | Return V’’; |

12 | i = i + 1; |

13 | While i <=10; |

14 | Select V’’ for the value of i at which the objective function obtained the minimum value; |

15 | Transformation function s; |

16 | Determine the value of the threshold ε to determine the convergence of the algorithm; |

17 | Assign the value for m; |

18 | t: = 1; |

19 | Calculate centroids: |

20 | Calculate the centroid v_{j}; |

21 | Classification: |

22 | Calculate and update the membership matrix U^{(t+1)}: = {µ_{ij}}; |

23 | Convergence: |

24 | If max [abs(µ_{ij}^{(t)} − µ_{ij}^{(t+1)})] < ε: |

25 | Stop the algorithm; |

26 | Otherwise: |

27 | U^{(t)}: = U^{(t+1)} y t: = t + 1; |

28 | Go to Classification |

29 | End of algorithm |

## 5. Results

#### 5.1. Experiment Environment

^{®}Core

^{TM}i9-10900 (Santa Clara, CA, USA) 2.80 GHz, 32 GB RAM, a 1 TB hard drive, and a Windows 10 Pro operating system.

#### 5.2. Description of Test Cases

#### 5.2.1. Description of Experiment I

#### 5.2.2. Description of Experiment II

#### 5.3. Analysis of Experiments

#### 5.3.1. Analysis of the Results of Experiment I

- In the case of HOFCM, for all datasets, time was reduced in all test cases. In the best case, it was reduced by up to 93.94%. This percentage is highlighted in bold in column seven. Notably, this percentage was achieved with the SPAM dataset, which has high dimensionality.
- HOFCM improved the quality of the solution in large datasets in 20 of 24 cases. In the best case, the quality of the solution was enhanced by 68.31%. This percentage is highlighted in bold in column eight. In the worst case, a quality loss of 2.19% was identified, as can be seen in column four. Regarding the small datasets, a solution quality of 62.5% was obtained in all cases, that is, in 10 of the 16 test cases.
- In general, based on the results obtained, it is possible to affirm that the HOFCM proposal performs better than the standard FCM algorithm.

#### 5.3.2. Analysis of the Results of Experiment II

- HOFCM outperformed the FCM++ algorithm in terms of solution time in all test cases with large datasets. In the small datasets, only in one case was it not higher. Regarding the quality of the solution in large datasets, in the best case, an average gain of 5.48% was obtained, and in the worst case, there was a loss of 2.33%. Both percentages are in bold in column eight in Table 3. It can be stated that HOFCM was higher in terms of solution quality in 75% of all cases.
- HOFCM outperformed the FCM-KMeans algorithm in solution quality in 82.5% of all test cases. Regarding the solution time, HOFCM was better in the large and real datasets.
- HOFCM outperformed the NFCM algorithm in solution quality in 85% of all test cases. Regarding the solution time, HOFCM was better in the large datasets.
- In all three comparisons, HOFCM obtained the greatest time reductions and the greatest gains in solution quality when dealing with large and real datasets.
- In general, based on the results obtained, it is possible to affirm that the HOFCM proposal performs better than the FCM++, FCM-KMeans, and NFCM algorithms.

## 6. Discussion

## 7. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Yang, M.S. A survey of fuzzy clustering. Math. Comput. Model.
**1993**, 18, 1–16. [Google Scholar] [CrossRef] - Nayak, J.; Naik, B.; Behera, H.S. Fuzzy C-Means (FCM) Clustering Algorithm: A Decade Review from 2000 to 2014. In Proceedings of the Comput Intell Data Mining, Odisha, India, 20–21 December 2014. [Google Scholar]
- Shirkhorshidi, A.S.; Aghabozorgi, S.; Wah, T.Y.; Herawan, T. Big Data Clustering: A Review. In Proceedings of the International Conference on Computational Science and Its Applications—ICCSA 2014, Guimaraes, Portugal, 30 June–3 July 2014. [Google Scholar]
- Ajin, V.W.; Kumar, L.D. Big data and clustering algorithms. In Proceedings of the 2016 International Conference on Research Advances in Integrated Navigation Systems (RAINS), Bangalore, India, 6–7 April 2016. [Google Scholar]
- MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symp Math Statis and Probability, Berkeley, CA, USA, 21 June–18 July 1965. [Google Scholar]
- Ruspini, E.H.; Bezdek, J.C.; Keller, J.M. Fuzzy Clustering: A Historical Perspective. IEEE Comput. Intell. Mag.
**2019**, 14, 45–55. [Google Scholar] [CrossRef] - Lee, G.M.; Gao, X. A Hybrid Approach Combining Fuzzy c-Means-Based Genetic Algorithm and Machine Learning for Predicting Job Cycle Times for Semiconductor Manufacturing. Appl. Sci.
**2021**, 11, 7428. [Google Scholar] [CrossRef] - Lee, S.J.; Song, D.H.; Kim, K.B.; Park, H.J. Efficient Fuzzy Image Stretching for Automatic Ganglion Cyst Extraction Using Fuzzy C-Means Quantization. Appl. Sci.
**2021**, 11, 12094. [Google Scholar] [CrossRef] - Ghosh, S.; Kumar, S. Comparative Analysis of K-Means and Fuzzy C-Means Algorithms. Int. J. Adv. Comput. Sci. Appl.
**2013**, 4, 35–39. [Google Scholar] [CrossRef] [Green Version] - Garey, M.R.; Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completeness; W. H. Freeman & Co.: New York, NY, USA, 1979; pp. 109–120. [Google Scholar]
- Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Plenum Press: New York, NY, USA, 1981; pp. 43–93. [Google Scholar]
- Stetco, A.; Zeng, X.J.; Keane, J. Fuzzy C-means++: Fuzzy C-means with effective seeding initialization. Expert Syst. Appl.
**2015**, 42, 7541–7548. [Google Scholar] [CrossRef] - Wu, Z.; Chen, G.; Yao, J. The Stock Classification Based on Entropy Weight Method and Improved Fuzzy C-means Algorithm. In Proceedings of the 2019 4th International Conference on Big Data and Computing, Guangzhou, China, 10–12 May 2019. [Google Scholar]
- Liu, Q.; Liu, J.; Li, M.; Zhou, Y. Approximation algorithms for fuzzy C-means problem based on seeding method. Theor. Comput. Sci.
**2021**, 885, 146–158. [Google Scholar] [CrossRef] - Arthur, D.; Vassilvitskii, S. k-means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007. [Google Scholar]
- Cai, W.; Chen, S.; Zhang, D. Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation. Pattern Recognit.
**2007**, 40, 825–838. [Google Scholar] [CrossRef] [Green Version] - Al-Ayyoub, M.; Al-andoli, M.; Jararweh, Y.; Smadi, M.; Gupta, B.B. Improving fuzzy C-mean-based community detection in social networks using dynamic parallelism. Comput Elect. Eng.
**2018**, 74, 533–546. [Google Scholar] [CrossRef] - Hashemzadeh, M.; Oskouei, A.G.; Farajzadeh, N. New fuzzy C-means clustering method based on feature-weight and cluster-weight learning. Appl. Soft. Comput.
**2019**, 78, 324–345. [Google Scholar] [CrossRef] - Khang, T.D.; Vuong, N.D.; Tran, M.-K.; Fowler, M. Fuzzy C-Means Clustering Algorithm with Multiple Fuzzification Coefficients. Algorithms
**2020**, 13, 158. [Google Scholar] [CrossRef] - Khang, T.D.; Tran, M.-K.; Fowler, M. A Novel Semi-Supervised Fuzzy C-Means Clustering Algorithm Using Multiple Fuzzification Coefficients. Algorithms
**2021**, 14, 258. [Google Scholar] [CrossRef] - Naldi, M.C.; Campello, R.J.G.B. Comparison of distributed evolutionary k-means clustering algorithms. Neurocomputing
**2015**, 163, 78–93. [Google Scholar] [CrossRef] - Pérez, J.; Almanza, N.N.; Romero, D. Balancing effort and benefit of K-means clustering algorithms in Big Data realms. PLoS ONE.
**2018**, 13, e0201874. [Google Scholar] - Selim, S.Z.; Ismail, M.A. K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality. IEEE Trans. Pattern Anal. Mach. Intell.
**1984**, PAMI-6, 81–87. [Google Scholar] [CrossRef] - Jancey, R.C. Multidimensional group analysis. Aust. J. Bot.
**1966**, 14, 127–130. [Google Scholar] [CrossRef] - Zadeh, L.A. Fuzzy sets. Inf. Control
**1965**, 8, 338–353. [Google Scholar] [CrossRef] [Green Version] - Bellman, R.; Kalaba, R.; Zadeh, L.A. Abstraction and pattern classification. J. Math. Anal. Appl.
**1966**, 13, 1–7. [Google Scholar] [CrossRef] - Ruspini, E.H. A new approach to clustering. Inf. Control
**1969**, 15, 22–32. [Google Scholar] [CrossRef] [Green Version] - Dunn, J.C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. J. Cybern.
**1974**, 3, 32–57. [Google Scholar] [CrossRef] - UCI Machine Learning Repository, University of California. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 26 January 2022).
- Rosen, K.H. Discrete Mathematics and Its Applications; McGraw-Hill Education: New York, NY, USA, 2018; pp. 90–98. [Google Scholar]
- McGeoch, C.C. A Guide to Experimental Algorithmics; Cambridge University Press: Cambridge, UK, 2012; pp. 17–114. [Google Scholar]

**Figure 1.**Initial and final centroids of the standard K-Means and FCM algorithms with the URBAN dataset in Example 1. Panel (

**a**) shows the randomly generated initial centroids; (

**b**) contains the final centroids with the solution of the standard FCM algorithm, considering the random initial centroids; (

**c**) displays the final centroids of the K-Means algorithm receiving random centroids; (

**d**) displays the final centroids of FCM with initial centroids generated after the convergence of K-Means; and (

**e**) presents a comparison of the solutions of the FCM algorithm, whose initial centroids are random and generated by K-Means.

**Figure 2.**Initial and final centroids of the standard K-Means and FCM algorithms with the URBAN dataset in Example 2. Panel (

**a**) shows the randomly generated initial centroids; (

**b**) contains the final centroids with the solution of the standard FCM algorithm, considering the random initial centroids; (

**c**) visualizes the final centroids with the solution of the K-Means algorithm that receives random centroids; (

**d**) shows the final centroids of FCM with initial centroids generated after K-Means convergence; and (

**e**) presents a comparison of solutions of the FCM and K-Means algorithms.

Id | Name | Type | n | d | Size Indicator n*d |
---|---|---|---|---|---|

1 | WDBC | Real | 569 | 30 | 17,070 |

2 | ABALONE | Real | 4177 | 7 | 29,239 |

3 | SPAM | Real | 4601 | 57 | 262,257 |

4 | URBAN | Real | 360,177 | 2 | 720,354 |

5 | 1m2d | Synthetic | 1,000,000 | 2 | 2,000,000 |

Standard FCM versus HOFCM | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

WDBC | ABALONE | SPAM | URBAN | ||||||||

P | c | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} |

1 | 2 | 80.88 | 0.00 | 64.14 | 0.0035 | 54.37 | −0.001 | 46.12 | 4.12 | 74.95 | 0.03 |

2 | 4 | 87.12 | 0.02 | 56.15 | −0.02 | 75.36 | 6.53 | 76.33 | −0.001 | 87.81 | 0.03 |

3 | 6 | 67.83 | −2.19 | 52.59 | 0.00 | 92.58 | 22.75 | 68.18 | 0.07 | 45.26 | 0.36 |

4 | 8 | 64.30 | −1.17 | 86.60 | 0.08 | 93.94 | 36.90 | 74.60 | 1.42 | 85.56 | 0.22 |

5 | 10 | 57.55 | 3.03 | 75.61 | 0.09 | 86.22 | 50.69 | 70.98 | 1.85 | 67.75 | −0.05 |

6 | 14 | 78.99 | 4.50 | 37.00 | −0.11 | 77.20 | 49.30 | 70.38 | 4.26 | 89.65 | −0.27 |

7 | 18 | 59.04 | 12.39 | 2.71 | −0.60 | 79.80 | 56.24 | 68.45 | 7.81 | 84.07 | 0.05 |

8 | 26 | 64.99 | 27.31 | 36.78 | −2.05 | 84.88 | 68.31 | 44.38 | 10.73 | 88.00 | 0.02 |

**Table 3.**Average percentages of time reduction and value of the objective function in Experiment II; HOFCM versus FCM++.

HOFCM versus FCM++ | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

WDBC | ABALONE | SPAM | URBAN | 1m2d | |||||||

P | c | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} |

1 | 2 | 66.87 | 0.00 | 59.38 | 0.00 | 41.67 | 0.00 | 18.19 | 0.00 | 32.08 | 0.03 |

2 | 4 | 78.59 | 0.02 | 36.47 | −0.02 | 53.08 | 0.01 | 52.36 | 0.00 | 73.61 | 0.03 |

3 | 6 | 69.42 | −0.52 | 54.03 | 0.04 | 54.04 | 5.48 | 54.83 | 1.86 | 37.78 | −0.01 |

4 | 8 | 57.86 | −0.20 | 80.64 | 0.05 | 60.42 | 1.97 | 70.42 | 2.20 | 84.36 | 0.02 |

5 | 10 | 57.29 | 1.76 | 68.80 | 0.09 | 71.62 | −2.33 | 63.27 | 1.70 | 62.20 | −0.07 |

6 | 14 | 45.97 | 0.43 | 39.00 | 0.04 | 25.56 | 2.77 | 41.40 | 2.10 | 86.60 | −0.19 |

7 | 18 | 37.01 | 0.18 | −7.22 | −0.06 | 32.72 | 0.94 | 40.44 | 3.57 | 80.08 | −0.03 |

8 | 26 | 27.58 | 2.92 | 38.48 | −0.71 | 42.08 | 2.24 | 13.33 | 0.62 | 82.99 | 0.09 |

**Table 4.**Average percentages of time reduction and value of the objective function in Experiment II; HOFCM versus FCM-KMeans.

HOFCM versus FCM-KMeans | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

WDBC | ABALONE | SPAM | URBAN | 1m2d | |||||||

P | c | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} |

1 | 2 | 20.05 | 0.00 | 15.52 | 0.00 | 33.95 | 0.00 | 30.15 | 0.00 | 55.79 | 0.03 |

2 | 4 | 23.74 | 0.00 | 9.72 | 0.01 | 58.11 | 6.53 | 28.55 | 0.00 | 47.73 | 0.00 |

3 | 6 | 32.29 | −2.51 | −10.98 | 0.00 | 90.88 | 22.75 | 68.57 | 0.97 | −13.94 | 0.42 |

4 | 8 | 47.87 | −1.28 | 50.98 | −0.02 | 72.58 | 14.66 | 58.99 | 0.14 | 53.89 | −0.02 |

5 | 10 | 66.05 | 3.09 | 60.49 | 0.12 | 41.19 | 33.65 | 49.89 | 3.56 | −63.13 | 0.05 |

6 | 14 | 84.13 | 4.52 | 46.83 | 0.69 | 57.67 | 50.13 | 63.45 | 2.17 | 78.87 | 0.06 |

7 | 18 | 84.21 | 11.99 | 24.45 | −0.06 | 73.88 | 57.61 | 42.56 | 9.40 | 41.81 | 0.02 |

8 | 26 | 80.98 | 29.01 | 41.00 | −0.81 | 83.03 | 66.86 | 5.55 | 3.99 | 45.48 | −0.01 |

**Table 5.**Average percentages of time reduction and value of the objective function in Experiment II; HOFCM versus NFCM.

HOFCM versus NFCM | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

WDBC | ABALONE | SPAM | URBAN | 1m2d | |||||||

P | c | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} | % time | % J _{m} |

1 | 2 | 74.65 | 0.01 | 61.84 | 0.00 | 47.58 | 0.00 | 18.43 | 0.00 | 19.79 | 0.03 |

2 | 4 | 76.70 | 3.74 | 44.02 | −0.02 | 55.53 | 4.66 | 60.83 | 0.00 | 82.07 | 0.03 |

3 | 6 | 61.78 | −1.09 | 51.07 | 0.02 | 75.69 | 4.77 | 54.85 | 2.34 | 30.90 | 0.13 |

4 | 8 | 55.55 | −0.30 | 82.67 | 0.76 | 81.25 | 9.07 | 72.68 | 1.81 | 87.45 | 0.01 |

5 | 10 | 48.54 | 3.21 | 71.65 | 0.46 | 55.00 | 20.87 | 56.78 | 3.77 | 59.43 | 0.16 |

6 | 14 | 35.87 | 1.15 | 53.76 | 0.17 | 19.88 | 12.98 | 58.73 | 2.38 | 86.76 | −0.15 |

7 | 18 | 1.49 | 2.63 | −4.36 | −0.40 | 34.42 | 1.35 | 44.02 | 3.29 | 76.42 | 0.12 |

8 | 26 | 20.09 | 4.98 | 26.10 | −2.02 | 47.18 | 5.33 | 22.79 | 3.47 | 84.93 | 0.14 |

Algorithm Dataset Name | Standard FCM | FCM-KMeans | FCM++ | NFCM | HOFCM |
---|---|---|---|---|---|

WDBC | 64,516.85 | 116,784.57 | 34,037.94 | 28,395.16 | 21,689.84 |

ABALONE | 115,702.20 | 123,930.39 | 113,680.29 | 105,730.07 | 76,996.63 |

SPAM | 1,746,301.34 | 1,355,655.16 | 482,874.71 | 513,205.42 | 289,610.79 |

URBAN | 3,080,009.70 | 1,886,664.03 | 1,885,697.60 | 2,104,360.89 | 1,321,799.74 |

1m2d | 11,181,675.94 | 3,091,877.12 | 8,385,299.13 | 8,729,514.14 | 1,592,132.11 |

The total sum of solution time | 16,188,206.03 | 6,574,911.27 | 10,901,589.67 | 11,481,205.68 | 3,302,229.11 * |

The number of times by which the HOFCM algorithm is faster | 4.90 | 1.99 | 3.30 | 3.48 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Pérez-Ortega, J.; Roblero-Aguilar, S.S.; Almanza-Ortega, N.N.; Frausto Solís, J.; Zavala-Díaz, C.; Hernández, Y.; Landero-Nájera, V.
Hybrid Fuzzy C-Means Clustering Algorithm Oriented to Big Data Realms. *Axioms* **2022**, *11*, 377.
https://doi.org/10.3390/axioms11080377

**AMA Style**

Pérez-Ortega J, Roblero-Aguilar SS, Almanza-Ortega NN, Frausto Solís J, Zavala-Díaz C, Hernández Y, Landero-Nájera V.
Hybrid Fuzzy C-Means Clustering Algorithm Oriented to Big Data Realms. *Axioms*. 2022; 11(8):377.
https://doi.org/10.3390/axioms11080377

**Chicago/Turabian Style**

Pérez-Ortega, Joaquín, Sandra Silvia Roblero-Aguilar, Nelva Nely Almanza-Ortega, Juan Frausto Solís, Crispín Zavala-Díaz, Yasmín Hernández, and Vanesa Landero-Nájera.
2022. "Hybrid Fuzzy C-Means Clustering Algorithm Oriented to Big Data Realms" *Axioms* 11, no. 8: 377.
https://doi.org/10.3390/axioms11080377