In this paper, a data clustering method named consensus fuzzy k-modes clustering is proposed to improve the performance of the clustering for the categorical data. At the same time, the coupling DNA-chain-hypergraph P system is constructed to realize the process of the clustering. This P system can prevent the clustering algorithm falling into the local optimum and realize the clustering process in implicit parallelism. The consensus fuzzy k-modes algorithm can combine the advantages of the fuzzy k-modes algorithm, weight fuzzy k-modes algorithm and genetic fuzzy k-modes algorithm. The fuzzy k-modes algorithm can realize the soft partition which is closer to reality, but treats all the variables equally. The weight fuzzy k-modes algorithm introduced the weight vector which strengthens the basic k-modes clustering by associating higher weights with features useful in analysis. These two methods are only improvements the k-modes algorithm itself. So, the genetic k-modes algorithm is proposed which used the genetic operations in the clustering process. In this paper, we examine these three kinds of k-modes algorithms and further introduce DNA genetic optimization operations in the final consensus process. Finally, we conduct experiments on the seven UCI datasets and compare the clustering results with another four categorical clustering algorithms. The experiment results and statistical test results show that our method can get better clustering results than the compared clustering algorithms, respectively.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited