A Fast K-prototypes Algorithm Using Partial Distance Computation
AbstractThe k-means is one of the most popular and widely used clustering algorithm; however, it is limited to numerical data only. The k-prototypes algorithm is an algorithm famous for dealing with both numerical and categorical data. However, there have been no studies to accelerate it. In this paper, we propose a new, fast k-prototypes algorithm that provides the same answers as those of the original k-prototypes algorithm. The proposed algorithm avoids distance computations using partial distance computation. Our k-prototypes algorithm finds minimum distance without distance computations of all attributes between an object and a cluster center, which allows it to reduce time complexity. A partial distance computation uses a fact that a value of the maximum difference between two categorical attributes is 1 during distance computations. If data objects have m categorical attributes, the maximum difference of categorical attributes between an object and a cluster center is m. Our algorithm first computes distance with numerical attributes only. If a difference of the minimum distance and the second smallest with numerical attributes is higher than m, we can find the minimum distance between an object and a cluster center without distance computations of categorical attributes. The experimental results show that the computational performance of the proposed k-prototypes algorithm is superior to the original k-prototypes algorithm in our dataset. View Full-Text
Share & Cite This Article
Kim, B. A Fast K-prototypes Algorithm Using Partial Distance Computation. Symmetry 2017, 9, 58.
Kim B. A Fast K-prototypes Algorithm Using Partial Distance Computation. Symmetry. 2017; 9(4):58.Chicago/Turabian Style
Kim, Byoungwook. 2017. "A Fast K-prototypes Algorithm Using Partial Distance Computation." Symmetry 9, no. 4: 58.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.