Semi-Supervised Clustering via Constraints Self-Learning
Abstract
:1. Introduction
- The proposed method can learn partial discriminant feature spaces with the partial pairwise constraints, i.e., discriminant spaces for the labeled point. And, an approximately optimal discriminant feature space can be finally explored by constraints self-learning iteratively.
- This method allows self-learning more constraints by local neighbors in the Euclidean distance from the partial discriminant spaces. Even for complicated data, promising must-link and cannot-link constraints can be self-learned.
- The proposed constraint-based clustering algorithm is effective, which only uses the constraint information to conduct clustering. As prior knowledge of the number of clusters is unnecessary, it can be further used for detecting the outliers in a large dataset.
2. Related Works
3. Materials and Methods
3.1. Constraints Self-Learning from Partial Discriminant Spaces (CS-PDS)
- How do we learn a discriminant space that can well illustrate the intrinsic structure of data in the Euclidean distance with the partial constraint information, i.e., must-links and cannot-links?
- Which kind of local neighborhoods can be used as new constraints?
- How do we tackle the conflict problem between must-link and cannot-link when we augment the constraints?
3.1.1. Solution for the 1st Question: Partial Discriminant Space Exploring
3.1.2. Solution for the 2nd Question: Self-Learning
- (1)
- Constraints learning for each must-link instance. u
- (2)
- Constraints learning for each cannot-link instance. v
3.1.3. Solution for the 3rd Question: Soft Regularization
3.1.4. The Algorithm
Algorithm 1 Constraints Self-learning from Partial Discriminant Spaces (CS-PDS) |
|
3.2. Finding Clusters on Constraints
Algorithm 2 Finding Clusters on Constraints (FC2) |
|
4. Experiments
4.1. Datasets and Experiments Setup
4.2. Results
5. Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ye, W.; Tian, H.; Tang, S.; Sun, X. Enhancing Shortest-Path Graph Kernels via Graph Augmentation. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Vilnius, Lithuania, 9–13 September 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 180–198. [Google Scholar]
- Qian, L.; Plant, C.; Qin, Y.; Qian, J.; Böhm, C. DynoGraph: Dynamic Graph Construction for Nonlinear Dimensionality Reduction. In Proceedings of the 2024 IEEE International Conference on Data Mining (ICDM), Abu Dhabi, United Arab Emirates, 9–12 December 2024; pp. 827–832. [Google Scholar]
- Guo, W.; Ye, W.; Chen, C.; Sun, X.; Böhm, C.; Plant, C.; Rahardjy, S. Bootstrap Deep Spectral Clustering with Optimal Transport. IEEE Trans. Multimed. 2025. [Google Scholar]
- Wagstaff, K.; Cardie, C.; Rogers, S.; Schrödl, S. Constrained k-means clustering with background knowledge. In Proceedings of the ICML, Williamstown, MA, USA, 28 June–1 July 2001; Volume 1, pp. 577–584. [Google Scholar]
- Basu, S.; Banerjee, A.; Mooney, R. Semi-supervised clustering by seeding. In Proceedings of the 19th International Conference on Machine Learning (ICML-2002), San Francisco, CA, USA, 8–12 July 2002. [Google Scholar]
- Bilenko, M.; Basu, S.; Mooney, R.J. Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; ACM: New York, NY, USA, 2004; p. 11. [Google Scholar]
- Zhang, Z.Y. Community structure detection in complex networks with partial background information. EPL (Europhys. Lett.) 2013, 101, 48005. [Google Scholar] [CrossRef]
- Qian, L.; Plant, C.; Böhm, C. Density-Based Clustering for Adaptive Density Variation. In Proceedings of the 2021 IEEE International Conference on Data Mining (ICDM), Auckland, New Zealand, 7–10 December 2021; pp. 1282–1287. [Google Scholar]
- Mautz, D.; Ye, W.; Plant, C.; Böhm, C. Non-redundant subspace clusterings with nr-kmeans and nr-dipmeans. ACM Trans. Knowl. Discov. Data (TKDD) 2020, 14, 1–24. [Google Scholar] [CrossRef]
- Hoi, S.C.; Liu, W.; Lyu, M.R.; Ma, W.Y. Learning distance metrics with contextual constraints for image retrieval. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006; Volume 2, pp. 2072–2078. [Google Scholar]
- Lu, Z.; Carreira-Perpinan, M.A. Constrained spectral clustering through affinity propagation. In Proceedings of the Computer Vision and Pattern Recognition, CVPR 2008, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Wang, X.; Davidson, I. Flexible constrained spectral clustering. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–28 July 2010; ACM: New York, NY, USA, 2010; pp. 563–572. [Google Scholar]
- Kawale, J.; Boley, D. Constrained Spectral Clustering using L1 Regularization. In Proceedings of the SDM, Austin, TX, USA, 2–4 May 2013; pp. 103–111. [Google Scholar]
- Wang, X.; Wang, J.; Qian, B.; Wang, F.; Davidson, I. Self-Taught Spectral Clustering via Constraint Augmentation. In Proceedings of the SDM, Philadelphia, PA, USA, 24–26 April 2014; pp. 416–424. [Google Scholar]
- Tang, S.; Tian, H.; Cao, X.; Ye, W. Deep hierarchical graph alignment kernels. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, Jeju, Republic of Korea, 3–9 August 2024; pp. 4964–4972. [Google Scholar]
- Yi, J.; Zhang, L.; Yang, T.; Liu, W.; Wang, J. An Efficient Semi-Supervised Clustering Algorithm with Sequential Constraints. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; ACM: New York, NY, USA, 2015; pp. 1405–1414. [Google Scholar]
- Sun, X.; Song, Z.; Yu, Y.; Dong, J.; Plant, C.; Böhm, C. Network Embedding via Deep Prediction Model. IEEE Trans. Big Data 2023, 9, 455–470. [Google Scholar] [CrossRef]
- Shental, N.; Bar-Hillel, A.; Hertz, T.; Weinshall, D. Computing Gaussian mixture models with EM using equivalence constraints. Adv. Neural Inf. Process. Syst. 2004, 16, 465–472. [Google Scholar]
- Basu, S.; Bilenko, M.; Mooney, R.J. A probabilistic framework for semi-supervised clustering. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; ACM: New York, NY, USA, 2004; pp. 59–68. [Google Scholar]
- Bansal, N.; Blum, A.; Chawla, S. Correlation clustering. Mach. Learn. 2004, 56, 89–113. [Google Scholar] [CrossRef]
- Bar-Hillel, A.; Hertz, T.; Shental, N.; Weinshall, D. Learning a mahalanobis metric from equivalence constraints. J. Mach. Learn. Res. 2005, 6, 937–965. [Google Scholar]
- Davis, J.V.; Kulis, B.; Jain, P.; Sra, S.; Dhillon, I.S. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning, Corvalis, ON, USA, 20–24 June 2007; ACM: New York, NY, USA, 2007; pp. 209–216. [Google Scholar]
- Nadler, B.; Galun, M. Fundamental limitations of spectral clustering. Adv. Neural Inf. Process. Syst. 2006, 19, 1017–1024. [Google Scholar]
- Yi, J.; Zhang, L.; Jin, R.; Qian, Q.; Jain, A. Semi-supervised clustering by input pattern assisted pairwise similarity matrix completion. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), Atlanta, GA, USA, 16–21 June 2013; pp. 1400–1408. [Google Scholar]
- Sun, X.; Zhang, Y.; Chen, C.; Xie, S.; Dong, J. High-order paired-ASPP for deep semantic segmentation networks. Inf. Sci. 2023, 646, 119364. [Google Scholar] [CrossRef]
- Klein, D.; Kamvar, S.D.; Manning, C.D. From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering. In Proceedings of the Nineteenth International Conference on Machine Learning, Sydney, Australia, 8–12 July 2002; Volume 655989, pp. 307–314. [Google Scholar]
- Xing, E.P.; Ng, A.Y.; Jordan, M.I.; Russell, S. Distance metric learning with application to clustering with side-information. Adv. Neural Inf. Process. Syst. 2003, 15, 505–512. [Google Scholar]
- Yu, S.X.; Shi, J. Segmentation given partial grouping constraints. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 173–183. [Google Scholar] [CrossRef] [PubMed]
- Yu, S.X.; Shi, J. Grouping with bias. In Proceedings of the NIPS, Vancouver, BC, Canada, 3–8 December 2001. [Google Scholar]
Operator | Definition |
---|---|
, | N data instances |
M and C | must-link and cannot-link matrix |
and | ground truth constraint matrices |
and | target constraint matrices |
Name | #Instances | #Features | #Clusters |
---|---|---|---|
Protein | 116 | 20 | 6 |
Iris | 150 | 4 | 3 |
Wine | 178 | 13 | 3 |
Letters | 227 | 16 | 3 |
Digits | 317 | 16 | 3 |
Segmentation | 2310 | 19 | 7 |
Name | FC2 | MPCK -Means | C- K-Means | ITML | S-T |
---|---|---|---|---|---|
Protein | 0.098 | 1.705 | 0.016 | 1.962 | 29.48 |
Iris | 0.225 | 2.765 | 41.83 | 1.410 | 53.58 |
Wine | 0.170 | 3.710 | 0.017 | 2.123 | 77.50 |
Letters | 0.352 | 5.521 | 12.12 | 2.473 | 156.6 |
Digits | 4.704 | 0.907 | 0.161 | 1.500 | 2021 |
Segment | 33.57 | 2.246 | 9.470 | 7.613 | N/A |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, X. Semi-Supervised Clustering via Constraints Self-Learning. Mathematics 2025, 13, 1535. https://doi.org/10.3390/math13091535
Sun X. Semi-Supervised Clustering via Constraints Self-Learning. Mathematics. 2025; 13(9):1535. https://doi.org/10.3390/math13091535
Chicago/Turabian StyleSun, Xin. 2025. "Semi-Supervised Clustering via Constraints Self-Learning" Mathematics 13, no. 9: 1535. https://doi.org/10.3390/math13091535
APA StyleSun, X. (2025). Semi-Supervised Clustering via Constraints Self-Learning. Mathematics, 13(9), 1535. https://doi.org/10.3390/math13091535