Fuzzy C-Means Clustering Algorithm with Multiple Fuzzification Coefficients
Abstract
:1. Introduction
2. Preliminaries
2.1. Standard Fuzzy C-Means (FCM) Algorithm
- is the membership value of the element in a cluster with center , ;
- , ; and for each .
- The larger is, the higher the degree of confidence that the element belongs to the cluster .
- is the squared distance between the element Xi and the cluster center Vk.
- m is the fuzzification coefficient of the algorithm.
- Step 1: Initialize value for , let , set and .
- Step 2: At the loop, update according to the formula:
- Step 3: Update for the next step , according to the formula:
- Step 4: If then go to Step 5; otherwise, let and return to Step 2.
- Step 5: End.
2.2. Interval Type-2 Fuzzy C-Means (FCMT2I) Clustering
- The algorithm is sensitive to noise and foreign elements;
- Clustering is not accurate and valid for elements located at the boundary between clusters;
- There is no specific criterion to select the value for the parameter m, which is often selected after testing multiple times.
3. Fuzzy C–Means Clustering with Multiple Fuzzification Coefficients (MC-FCM)
3.1. The Fuzzification Coefficients
- Step 1: Calculate the distance between two elements , ; .
- Step 2: Rearrange with index , we have in non-decreasing order. Calculate .
- Step 3: Calculate and . For each , , calculate
3.2. Derivation of the MC-FCM Clustering Algorithm
- Step 1: Initialize value for , let , set .
- Step 2: At the loop, update according to Equation (10).
- Step 3: Update for the next step , according to Equation (7).
- Step 4: If , then go to Step 5; otherwise, let and return to Step 2.
- Step 5: End.
4. Evaluation of the Proposed MC-FCM Algorithm
- The Davies–Bouldin (DB) index is based on a ratio involving within-group and between-group distances. , where , is the within-to-between cluster spread for the j-th and k–th clusters, i.e., , where and are the average within-group distances for the j-th and k–th clusters, respectively and is the inter-group distance between these clusters. These distances are defined as and . Here, represents the worst-case within-to-between cluster spread involving the j-th cluster. Minimizing for all clusters minimizes the DB index. Hence, good partitions, which are comprised of compact and separated clusters, are distinguished by small values of DB.
- The Alternative Silhouette Width Criterion (ASWC) is the ratio between the inter-group distance and the intra-group distance. , where . Let us consider that the i-th element of the dataset, belongs to a given cluster then is the average distance of to all other elements in this cluster, is the average distance of to all elements in another cluster , with , is the minimum of computed over with and is a small constant (e.g., 10−6 for normalized data) used to avoid division by zero when . Large ASWC values indicate good partitions.
- The PBM index is also based on the within-group and between-group distances. , where denotes the sum of distances between the elements and the grand mean of the data, represents the sum of within-group distances and is the maximum distance between group centroids. The best partition is indicated when PBM is maximized.
- The Rand index (RI) can be seen as an absolute criterion that allows the use of properly labeled datasets for performance assessment of clustering results. This simple and intuitive index handles two hard partition matrices ( and ) of the same dataset. The reference partition, , encodes the class labels, while the partition partitions the data into clusters and is the one to be evaluated. We have , where denotes the number of pairs of data elements belonging to the same class in and to the same cluster in , denotes the number of pairs of data elements belonging to different classes in and to different clusters in . Large RI values indicate compatible clustering with the given class labels.
- Mean accuracy (MA), , where is the number of elements in the cluster after clustering and is the actual number of elements in cluster . Large MA values often indicate good clustering.
- (i)
- Perform FCM with m = 2 several times and record the run with the best MA index result;
- (ii)
- Perform FCM with changing m and record the run with the best MA index result;
- (iii)
- Perform MC-FCM with changing mL, mU and α and record the run with the best MA index result;
- (iv)
- Perform MC-FCM with the same mL and mU as in (iii), adjust α and record the run with the best DB index result;
- (v)
- Perform FCMT2I several times with the same mL and mU as in (iii) and record the run with the best MA index result.
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Everitt, B.S.; Landau, S.; Leese, M.; Stahl, D. Cluster Analysis, 5th ed.; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2011. [Google Scholar]
- Bezdek, J.C.; Ehrlich, R.; Full, W. FCM: The fuzzy c-mean clustering algorithm. Comput. Geosci. 1984, 10, 191–203. [Google Scholar] [CrossRef]
- Ruspini, E.H.; Bezdek, J.C.; Keller, J.M. Fuzzy Clustering: A Historical Perspective. IEEE Comput. Intell. Mag. 2019, 14, 45–55. [Google Scholar] [CrossRef]
- Gosain, A.; Dahiya, S. Performance Analysis of Various Fuzzy Clustering Algorithms: A Review. Procedia Comput. Sci. 2016, 79, 100–111. [Google Scholar] [CrossRef] [Green Version]
- Arora, J.; Khatter, K.; Tushir, M. Fuzzy c-Means Clustering Strategies: A Review of Distance Measures. Softw. Eng. 2018, 731, 153–162. [Google Scholar]
- Hwang, C.; Rhee, F.C.-H. Uncertain Fuzzy Clustering: Interval Type-2 Fuzzy Approach to C-Means. IEEE Trans. Fuzzy Syst. 2007, 15, 107–120. [Google Scholar] [CrossRef]
- Ji, Z.; Xia, Y.; Sun, Q.; Cao, G. Interval-valued possibilistic fuzzy C-means clustering algorithm. Fuzzy Sets Syst. 2014, 253, 138–156. [Google Scholar] [CrossRef]
- Linda, O.; Manic, M. General Type-2 Fuzzy C-Means Algorithm for Uncertain Fuzzy Clustering. IEEE Trans. Fuzzy Syst. 2012, 20, 883–897. [Google Scholar] [CrossRef]
- Pagola, M.; Jurio, A.; Barrenechea, E.; Fernández, J.; Bustince, H. Interval-valued fuzzy clustering. In Proceedings of the 16th World Congress of the International Fuzzy Systems Association (IFSA) and 9th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT), Paris, France, 30 June–3 July 2015; pp. 1288–1294. [Google Scholar]
- Wu, D.; Mendel, J.M. Enhanced Karnik-Mendel Algorithms for Interval Type-2 Fuzzy Sets and Systems. In Proceedings of the NAFIPS ’07, Annual Meeting of the North American Fuzzy Information Processing Society, San Diego, CA, USA, 24–27 June 2007; pp. 184–189. [Google Scholar]
- Du, M.; Ding, S.; Xue, Y. A robust density peaks clustering algorithm using fuzzy neighborhood. Int. J. Mach. Learn. Cyber 2017, 9, 1131–1140. [Google Scholar] [CrossRef]
- Trabelsi, M.; Frigui, H. Robust fuzzy clustering for multiple instance regression. Pattern Recognit. 2019, 90, 424–435. [Google Scholar] [CrossRef] [Green Version]
- Bache, K.; Lichman, M. UCI Machine Learning Repository; Univ. California, School of Information and Computer Science: Irvine, CA, USA, 2013; Available online: http://archive.ics.uci.edu/ml (accessed on 19 September 2019).
- Vendramin, L.; Campello, R.J.G.B.; Hruschka, E.R. Relative Clustering Validity Criteria: A Comparative Overview. Stat. Anal. Data Min. 2010, 3, 209–235. [Google Scholar] [CrossRef]
- Nguyen, C.H.; Tran, D.K.; Nam, H.V.; Nguyen, H.C. Hedge Algebras, Linguistic-Valued Logic and Their Application to Fuzzy Reasoning. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 1999, 7, 347–361. [Google Scholar] [CrossRef]
- Anh Phong, P.; Dinh Khang, T.; Khac Dong, D. A fuzzy rule-based classification system using Hedge Algebraic Type-2 Fuzzy Sets. In Proceedings of the Annual Conference of the North American Fuzzy Information Processing Society (NAFIPS), El Paso, TX, USA, 31 October–4 November 2016; pp. 265–270. [Google Scholar]
- Khang, T.D.; Phong, P.A.; Dong, D.K.; Trang, C.M. Hedge Algebraic Type-2 Fuzzy Sets. In Proceedings of the Conference: FUZZ-IEEE 2010, IEEE International Conference on Fuzzy Systems, Barcelona, Spain, 18–23 July 2010; pp. 1850–1857. [Google Scholar]
Dataset | Samples | Attributes | Classes | Description |
---|---|---|---|---|
ECOLI | 336 | 7 | 8 | This dataset consists of 7 characteristics of 8 E. coli bacteria types used to identify them. |
HEART | 303 | 13 | 2 | This dataset consists of 13 symptoms used to determine if one has heart disease. |
WDBC | 569 | 32 | 2 | This dataset consists of 32 metrics obtained from X-ray images of breast cancer tumors used to determine if one has breast cancer. |
IRIS | 150 | 4 | 3 | This dataset consists of 4 characteristics of 3 types of irises used to identify them. |
WINE | 178 | 13 | 3 | This dataset consists of 13 chemical constituents in 3 types of Italian wine used to identify them. |
Algorithms | DB | ASWC | PBM | RI | MA |
---|---|---|---|---|---|
FCM, m = 2 | 2.5855 | 0.8183 | 0.0098 | 0.8403 | 0.7652 |
FCM, m = 6.1 | 2.8955 | 0.8657 | 0.0091 | 0.8604 | 0.8077 |
MC-FCM, α = 0.1 | 2.8329 | 0.8995 | 0.0084 | 0.8699 | 0.8244 |
MC-FCM, α = 1.9 | 2.4021 | 0.895 | 0.0089 | 0.8644 | 0.8125 |
FCMT2I | 3.4561 | 0.8581 | 0.0091 | 0.8546 | 0.8184 |
Algorithms | DB | ASWC | PBM | RI | MA |
---|---|---|---|---|---|
FCM, m = 2 | 0.7445 | 0.8182 | 0.8118 | 0.5154 | 0.5926 |
FCM, m = 3 | 0.9044 | 0.8159 | 0.8102 | 0.5213 | 0.6074 |
MC-FCM, α = 0.8 | 0.7319 | 0.8140 | 0.8124 | 0.5229 | 0.6148 |
MC-FCM, α = 1.7 | 0.7306 | 0.8159 | 0.8102 | 0.5213 | 0.6074 |
FCMT2I | 0.7684 | 0.8166 | 0.8186 | 0.5168 | 0.5963 |
Algorithms | DB | ASWC | PBM | RI | MA |
---|---|---|---|---|---|
FCM, m = 2 | 1.2348 | 2.2109 | 23.036 | 0.7504 | 0.8541 |
FCM, m = 6 | 1.0618 | 2.0409 | 22.566 | 0.7707 | 0.8682 |
MC-FCM, α = 0.7 | 0.6508 | 1.588 | 20.147 | 0.8365 | 0.9104 |
MC-FCM, α = 0.4 | 0.6298 | 1.4938 | 19.897 | 0.8216 | 0.9051 |
FCMT2I | 0.7847 | 1.588 | 20.147 | 0.8365 | 0.9104 |
Algorithms | DB | ASWC | PBM | RI | MA |
---|---|---|---|---|---|
FCM, m = 2 | 3.4835 | 1.7587 | 0.1574 | 0.8797 | 0.8933 |
FCM, m = 9 | 2.0737 | 1.6771 | 0.1498 | 0.9124 | 0.9267 |
MC-FCM, α = 2.5 | 2.1388 | 1.6824 | 0.1471 | 0.9195 | 0.9333 |
MC-FCM, α = 9.9 | 2.0714 | 1.6794 | 0.1489 | 0.8797 | 0.92 |
FCMT2I | 1.3406 | 1.7548 | 0.1377 | 0.8464 | 0.8533 |
Algorithms | DB | ASWC | PBM | RI | MA |
FCM, m = 2 | 2.6983 | 1.2521 | 2.3675 | 0.7105 | 0.6854 |
FCM, m = 10 | 2.2146 | 1.3040 | 2.3711 | 0.7204 | 0.7079 |
MC-FCM, α = 0.7 | 3.7023 | 1.2867 | 2.2668 | 0.7363 | 0.7303 |
MC-FCM, α = 8.7 | 1.2995 | 1.3131 | 2.3649 | 0.7187 | 0.7022 |
FCMT2I | 2.0272 | 1.3308 | 2.2763 | 0.7254 | 0.6910 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khang, T.D.; Vuong, N.D.; Tran, M.-K.; Fowler, M. Fuzzy C-Means Clustering Algorithm with Multiple Fuzzification Coefficients. Algorithms 2020, 13, 158. https://doi.org/10.3390/a13070158
Khang TD, Vuong ND, Tran M-K, Fowler M. Fuzzy C-Means Clustering Algorithm with Multiple Fuzzification Coefficients. Algorithms. 2020; 13(7):158. https://doi.org/10.3390/a13070158
Chicago/Turabian StyleKhang, Tran Dinh, Nguyen Duc Vuong, Manh-Kien Tran, and Michael Fowler. 2020. "Fuzzy C-Means Clustering Algorithm with Multiple Fuzzification Coefficients" Algorithms 13, no. 7: 158. https://doi.org/10.3390/a13070158
APA StyleKhang, T. D., Vuong, N. D., Tran, M. -K., & Fowler, M. (2020). Fuzzy C-Means Clustering Algorithm with Multiple Fuzzification Coefficients. Algorithms, 13(7), 158. https://doi.org/10.3390/a13070158