Next Article in Journal
Elastic AlignedSENSE for Dynamic MR Reconstruction: A Proof of Concept in Cardiac Cine
Previous Article in Journal
Biophotons and Emergence of Quantum Coherence—A Diffusion Entropy Analysis
 
 
Article

An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining

School of Computer Science, Wuhan University, Wuhan 430072, China
*
Authors to whom correspondence should be addressed.
Academic Editor: Joaquín Abellán
Entropy 2021, 23(5), 553; https://doi.org/10.3390/e23050553
Received: 7 April 2021 / Revised: 22 April 2021 / Accepted: 26 April 2021 / Published: 29 April 2021
Clustering algorithms for multi-database mining (MDM) rely on computing (n2n)/2 pairwise similarities between n multiple databases to generate and evaluate m[1,(n2n)/2] candidate clusterings in order to select the ideal partitioning that optimizes a predefined goodness measure. However, when these pairwise similarities are distributed around the mean value, the clustering algorithm becomes indecisive when choosing what database pairs are considered eligible to be grouped together. Consequently, a trivial result is produced by putting all the n databases in one cluster or by returning n singleton clusters. To tackle the latter problem, we propose a learning algorithm to reduce the fuzziness of the similarity matrix by minimizing a weighted binary entropy loss function via gradient descent and back-propagation. As a result, the learned model will improve the certainty of the clustering algorithm by correctly identifying the optimal database clusters. Additionally, in contrast to gradient-based clustering algorithms, which are sensitive to the choice of the learning rate and require more iterations to converge, we propose a learning-rate-free algorithm to assess the candidate clusterings generated on the fly in fewer upper-bounded iterations. To achieve our goal, we use coordinate descent (CD) and back-propagation to search for the optimal clustering of the n multiple database in a way that minimizes a convex clustering quality measure L(θ) in less than (n2n)/2 iterations. By using a max-heap data structure within our CD algorithm, we optimally choose the largest weight variable θp,q(i) at each iteration i such that taking the partial derivative of L(θ) with respect to θp,q(i) allows us to attain the next steepest descent minimizing L(θ) without using a learning rate. Through a series of experiments on multiple database samples, we show that our algorithm outperforms the existing clustering algorithms for MDM. View Full-Text
Keywords: coordinate descent; clustering; multi-database mining; fuzziness; binary entropy loss; similarity matrix coordinate descent; clustering; multi-database mining; fuzziness; binary entropy loss; similarity matrix
Show Figures

Figure 1

MDPI and ACS Style

Miloudi, S.; Wang, Y.; Ding, W. An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining. Entropy 2021, 23, 553. https://doi.org/10.3390/e23050553

AMA Style

Miloudi S, Wang Y, Ding W. An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining. Entropy. 2021; 23(5):553. https://doi.org/10.3390/e23050553

Chicago/Turabian Style

Miloudi, Salim, Yulin Wang, and Wenjia Ding. 2021. "An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining" Entropy 23, no. 5: 553. https://doi.org/10.3390/e23050553

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop