SUN: Stochastic UNsupervised Learning for Data Noise and Uncertainty Reduction
Abstract
1. Introduction
2. The SUN Algorithm
2.1. Design Rationale
2.2. RUN-ICON, Stability Score, and Selection Rule
- Draw R K-means++ initialisations and fit K-means for each.
- Aggregate the R sets of centres and identify dominant centres by mode seeking under a small distance threshold in the centre space.
- Compute a stability score, , as the fraction of runs whose fitted centres match the same set of dominant centres within .
- Repeat the previous steps M times to obtain a sample of values and summarise their spread by an uncertainty measure , i.e., the difference between upper and lower CDI bounds across outer repeats.
2.3. Gaussian Mixture Models
2.3.1. Stochastic Modelling—Mathematical Theory
2.3.2. Setting up the GMM Framework
2.3.3. Expectation–Maximisation (EM) Algorithm
- E-step. In this step, we compute the probability that a given data point belongs to cluster k. We evaluate K probabilities for each point corresponding to its membership in any of the K clusters. The highest probability determines the assignment of the point to a cluster. Consequently, a point may be reassigned to a different cluster than it was originally associated with.
- M-step. This step involves reassigning points to clusters and updating cluster parameters. The means, covariance matrices, and weights are recalculated based on the newly assigned points, using Equations (A1), (A2) and (A4), respectively. While the number of clusters K remains unchanged, their centres are adjusted to reflect the updated means.
2.4. Default Settings and Diagnostics
2.5. Implementation Steps of the SUN Algorithm
- Initialisation with RUN-ICON
- EM algorithmExecute the EM algorithm to refine the GMM parameters:
- E-step: Calculate the probabilities for all points to belong to any of the k clusters.
- M-step: Redistribute points in K clusters based on highest probability scores and recalculate , , and .
- Check for convergenceTo check for convergence, the previously and newly calculated means are compared. If they are below the user-set tolerance, the algorithm terminates. Otherwise, we return to Step 2 and repeat.
2.6. Theoretical Convergence Properties and Noise Suppression
2.6.1. Convergence Analysis
- RUN-ICON Convergence: The algorithm converges to dominant cluster centres through statistical mode-seeking. Let denote the cluster centres from the i-th iteration. The dominance frequency converges to a maximum as :where represents the true cluster centres.
- GMM-EM Convergence: The EM algorithm guarantees a monotonic increase in log-likelihood:where represents the GMM parameters.
- Combined Convergence: The SUN algorithm converges when
2.6.2. Noise Suppression Mechanism
- Statistical Filtering (RUN-ICON): By aggregating multiple K-means++ runs, random initialisation effects are averaged out:
- Probabilistic Smoothing (GMM): The soft assignment probabilities act as a low-pass filter:where is a threshold determined by the cluster covariance structure.
3. Results and Discussion
- The “noisy” dataset (dataset “zelnik2” in [93]) contains randomly generated background noise and was chosen to test the efficiency of the algorithm in situations where noise is present.
- The“Mickey Mouse” dataset was generated by the authors to test the ability of the algorithm to identify objects with distinct shapes.
- The “particles” dataset was generated by the authors to test the algorithm’s ability to identify regions with distinct flow characteristics during particle dispersion.
- The Wine Recognition dataset [94] is a well-established multimodal dataset that enables us to test the robustness of the method under heterogeneous feature spaces, which is essential for applications where the underlying phenomena arise from multiple interacting modes of behaviour.
3.1. “Noisy” Dataset
3.2. “Mickey Mouse” Dataset
3.3. “Particles” Dataset
3.4. Multimodal Wine Recognition Dataset
3.5. Comparisons with Other Clustering Methods/Schemes
3.6. Potential Applications
- Medical Imaging: Tumour segmentation in MRI/CT scans with uncertainty quantification for surgical planning.
- Financial Markets: Anomaly detection in trading patterns with confidence intervals for risk assessment.
- Climate Science: Classification of weather patterns with probabilistic boundaries for improved forecasting.
- Manufacturing: Quality control clustering with confidence metrics for defect detection.
- Social Networks: Community detection with probabilistic boundaries for influence analysis.
- Genomics: Cell type identification in single-cell RNA sequencing with uncertainty estimates.
- Cybersecurity: Network traffic clustering for intrusion detection with false positive probability estimates.
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. The GMM Framework
- .
- is the j-th element of matrix ().
- is the number of points in cluster k.
- is the j-th coordinate of a point that belongs to cluster k.
- is the matrix of all cluster-k points minus the corresponding to each row component.
- and are the determinant and inverse of the covariance matrix, respectively.
- is an vector, corresponding to the difference between point and its respective component of .
Appendix B. Pseudocode for the SUN Algorithm
| Algorithm A1 SUN: Stability-driven clustering with probabilistic assignments |
|
References
- Jin, X.; Han, J. K-Means Clustering. In Encyclopedia of Machine Learning; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2010; pp. 563–564. [Google Scholar] [CrossRef]
- Alloghani, M.; Al-Jumeily, D.; Mustafina, J.; Hussain, A.; Aljaaf, A.J. A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science. In Supervised and Unsupervised Learning for Data Science; Berry, M.W., Mohamed, A., Yap, B.W., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 3–21. [Google Scholar] [CrossRef]
- Glielmo, A.; Husic, B.E.; Rodriguez, A.; Clementi, C.; Noé, F.; Laio, A. Unsupervised Learning Methods for Molecular Simulation Data. Chem. Rev. 2021, 121, 9722–9758. [Google Scholar] [CrossRef] [PubMed]
- Neuer, M.J. Unsupervised Learning. In Machine Learning for Engineers: Introduction to Physics-Informed, Explainable Learning Methods for AI in Engineering Applications; Springer: Berlin/Heidelberg, Germany, 2025; pp. 141–172. [Google Scholar] [CrossRef]
- Bekker, J.; Davis, J. Learning from positive and unlabeled data: A survey. Mach. Learn. 2020, 109, 719–760. [Google Scholar] [CrossRef]
- Li, B.; Cheng, F.; Zhang, X.; Cui, C.; Cai, W. A novel semi-supervised data-driven method for chiller fault diagnosis with unlabeled data. Appl. Energy 2021, 285, 116459. [Google Scholar] [CrossRef]
- Liu, P.; Wang, L.; Ranjan, R.; He, G.; Zhao, L. A Survey on Active Deep Learning: From Model Driven to Data Driven. ACM Comput. Surv. 2022, 54, 1–34. [Google Scholar] [CrossRef]
- Priyadarshi, R.; Ranjan, R.; Kumar Vishwakarma, A.; Yang, T.; Singh Rathore, R. Exploring the Frontiers of Unsupervised Learning Techniques for Diagnosis of Cardiovascular Disorder: A Systematic Review. IEEE Access 2024, 12, 139253–139272. [Google Scholar] [CrossRef]
- Fotopoulou, S. A review of unsupervised learning in astronomy. Astron. Comput. 2024, 48, 100851. [Google Scholar] [CrossRef]
- Xu, T. Credit Risk Assessment Using a Combined Approach of Supervised and Unsupervised Learning. J. Comput. Methods Eng. Appl. 2024, 4, 1–12. [Google Scholar] [CrossRef]
- Tautan, A.M.; Andrei, A.G.; Smeralda, C.L.; Vatti, G.; Rossi, S.; Ionescu, B. Unsupervised learning from EEG data for epilepsy: A systematic literature review. Artif. Intell. Med. 2025, 162, 103095. [Google Scholar] [CrossRef]
- Kansal, T.; Bahuguna, S.; Singh, V.; Choudhury, T. Customer Segmentation using K-means Clustering. In Proceedings of the 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), Belgaum, India, 21–22 December 2018; pp. 135–139. [Google Scholar] [CrossRef]
- Afzal, A.; Khan, L.; Hussain, M.Z.; Hasan, M.Z.; Mustafa, M.; Khalid, A.; Awan, R.; Ashraf, F.; Khan, Z.A.; Javaid, A. Customer Segmentation Using Hierarchical Clustering. In Proceedings of the 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), Pune, India, 5–7 August 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Patriarca, R.; Simone, F.; Di Gravio, G. Supporting weather forecasting performance management at aerodromes through anomaly detection and hierarchical clustering. Expert Syst. Appl. 2023, 213, 119210. [Google Scholar] [CrossRef]
- Jain, M.; Kaur, G.; Saxena, V. A K-Means clustering and SVM based hybrid concept drift detection technique for network anomaly detection. Expert Syst. Appl. 2022, 193, 116510. [Google Scholar] [CrossRef]
- Karczmarek, P.; Kiersztyn, A.; Pedrycz, W.; Al, E. K-Means-based isolation forest. Knowl.-Based Syst. 2020, 195, 105659. [Google Scholar] [CrossRef]
- Chhabra, P.; Garg, N.K.; Kumar, M. Content-based image retrieval system using ORB and SIFT features. Neural Comput. Appl. 2020, 32, 2725–2733. [Google Scholar] [CrossRef]
- Du, M.; Ding, S.; Jia, H. Study on density peaks clustering based on k-nearest neighbors and principal component analysis. Knowl.-Based Syst. 2016, 99, 135–145. [Google Scholar] [CrossRef]
- Nilashi, M.; Ibrahim, O.; Bagherifard, K. A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Syst. Appl. 2018, 92, 507–520. [Google Scholar] [CrossRef]
- Lu, Y.; Zhang, Y.; Cui, Z.; Long, W.; Chen, Z. Multi-Dimensional Manifolds Consistency Regularization for semi-supervised remote sensing semantic segmentation. Knowl.-Based Syst. 2024, 299, 112032. [Google Scholar] [CrossRef]
- Xi, L.; Liang, C.; Liu, H.; Li, A. Unsupervised dimension-contribution-aware embeddings transformation for anomaly detection. Knowl.-Based Syst. 2023, 262, 110209. [Google Scholar] [CrossRef]
- Hilal, W.; Gadsden, S.A.; Yawney, J. Financial Fraud: A Review of Anomaly Detection Techniques and Recent Advances. Expert Syst. Appl. 2022, 193, 116429. [Google Scholar] [CrossRef]
- Verma, K.K.; Singh, B.M.; Dixit, A. A review of supervised and unsupervised machine learning techniques for suspicious behavior recognition in intelligent surveillance system. Int. J. Inf. Technol. 2022, 14, 397–410. [Google Scholar] [CrossRef]
- Zeiser, A.; Özcan, B.; van Stein, B.; Bäck, T. Evaluation of deep unsupervised anomaly detection methods with a data-centric approach for on-line inspection. Comput. Ind. 2023, 146, 103852. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. An introduction to variational autoencoders. Found. Trends® Mach. Learn. 2019, 12, 307–392. [Google Scholar] [CrossRef]
- Wang, X.; Du, Y.; Lin, S.; Cui, P.; Shen, Y.; Yang, Y. adVAE: A self-adversarial variational autoencoder with Gaussian anomaly prior knowledge for anomaly detection. Knowl.-Based Syst. 2020, 190, 105187. [Google Scholar] [CrossRef]
- Zhou, F.; Yang, S.; Fujita, H.; Chen, D.; Wen, C. Deep learning fault diagnosis method based on global optimization GAN for unbalanced data. Knowl.-Based Syst. 2020, 187, 104837. [Google Scholar] [CrossRef]
- Ozbey, M.; Dalmaz, O.; Dar, S.U.H.; Bedel, H.A.; Ozturk, Ş.; Güngör, A.; Çukur, T. Unsupervised Medical Image Translation With Adversarial Diffusion Models. IEEE Trans. Med. Imaging 2023, 42, 3524–3539. [Google Scholar] [CrossRef] [PubMed]
- Chadebec, C.; Vincent, L.; Allassonniere, S. Pythae: Unifying Generative Autoencoders in Python—A Benchmarking Use Case. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: San Jose, CA, USA, 2022; Volume 35, pp. 21575–21589. [Google Scholar]
- Das, A.K.; Goswami, S.; Chakrabarti, A.; Chakraborty, B. A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Syst. Appl. 2017, 88, 81–94. [Google Scholar] [CrossRef]
- Shen, B. E-commerce Customer Segmentation via Unsupervised Machine Learning. In Proceedings of the 2nd International Conference on Computing and Data Science, Stanford, CA, USA, 28–30 January 2021. [Google Scholar] [CrossRef]
- Naeem, S.; Ali, A.; Anam, S.; Ahmed, M.M. An unsupervised machine learning algorithms: Comprehensive review. Int. J. Comput. Digit. Syst. 2023, 13, 911–921. [Google Scholar] [CrossRef]
- Gui, J.; Chen, T.; Zhang, J.; Cao, Q.; Sun, Z.; Luo, H.; Tao, D. A Survey on Self-Supervised Learning: Algorithms, Applications, and Future Trends. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9052–9071. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Swersky, K.; Norouzi, M.; Hinton, G.E. Big Self-Supervised Models are Strong Semi-Supervised Learners. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: San Jose, CA, USA, 2020; Volume 33, pp. 22243–22255. [Google Scholar]
- Wang, X.; Zhang, R.; Shen, C.; Kong, T.; Li, L. Dense Contrastive Learning for Self-Supervised Visual Pre-Training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; p. 11, Oral paper. [Google Scholar]
- Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.; Buchatskaya, E.; Doersch, C.; Avila Pires, B.; Guo, Z.; Gheshlaghi Azar, M.; et al. Bootstrap Your Own Latent—A New Approach to Self-Supervised Learning. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: San Jose, CA, USA, 2020; Volume 33, pp. 21271–21284. [Google Scholar]
- Koroteev, M.V. BERT: A review of applications in natural language processing and understanding. arXiv 2021, arXiv:2103.11943. [Google Scholar] [CrossRef]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Xie, Q.; Bishop, J.A.; Tiwari, P.; Ananiadou, S. Pre-trained language models with domain knowledge for biomedical extractive summarization. Knowl.-Based Syst. 2022, 252, 109460. [Google Scholar] [CrossRef]
- Yin, R.; Li, K.; Zhang, G.; Lu, J. A deeper graph neural network for recommender systems. Knowl.-Based Syst. 2019, 185, 105020. [Google Scholar] [CrossRef]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
- Tan, Q.; Liu, N.; Huang, X.; Choi, S.H.; Li, L.; Chen, R.; Hu, X. S2GAE: Self-Supervised Graph Autoencoders are Generalizable Learners with Graph Masking. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; pp. 787–795. [Google Scholar] [CrossRef]
- Shao, M.; Lin, Y.; Peng, Q.; Zhao, J.; Pei, Z.; Sun, Y. Learning graph deep autoencoder for anomaly detection in multi-attributed networks. Knowl.-Based Syst. 2023, 260, 110084. [Google Scholar] [CrossRef]
- Sheng, V.S.; Provost, F.; Ipeirotis, P.G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27August 2008; pp. 614–622. [Google Scholar] [CrossRef]
- Aldoseri, A.; Al-Khalifa, K.N.; Hamouda, A.M. Re-Thinking Data Strategy and Integration for Artificial Intelligence: Concepts, Opportunities, and Challenges. Appl. Sci. 2023, 13, 7082. [Google Scholar] [CrossRef]
- Tang, H.; Zhu, X.; Chen, K.; Jia, K.; Chen, C.L.P. Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain Adaptation Using Structurally Regularized Deep Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6517–6533. [Google Scholar] [CrossRef]
- Wang, J.; Biljecki, F. Unsupervised machine learning in urban studies: A systematic review of applications. Cities 2022, 129, 103925. [Google Scholar] [CrossRef]
- Christakis, N.; Drikakis, D. Reducing Uncertainty and Increasing Confidence in Unsupervised Learning. Mathematics 2023, 11, 3063. [Google Scholar] [CrossRef]
- Gupta, S.; Gupta, A. Dealing with Noise Problem in Machine Learning Data-sets: A Systematic Review. Procedia Comput. Sci. 2019, 161, 466–474. [Google Scholar] [CrossRef]
- Zhang, G.; Hao, H.; Wang, Y.; Jiang, Y.; Shi, J.; Yu, J.; Cui, X.; Li, J.; Zhou, S.; Yu, B. Optimized adaptive Savitzky-Golay filtering algorithm based on deep learning network for absorption spectroscopy. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2021, 263, 120187. [Google Scholar] [CrossRef]
- Kascenas, A.; Pugeault, N.; O’Neil, A.Q. Denoising Autoencoders for Unsupervised Anomaly Detection in Brain MRI. In Proceedings of the 5th International Conference on Medical Imaging with Deep Learning; Konukoglu, E., Menze, B., Venkataraman, A., Baumgartner, C., Dou, Q., Albarqouni, S., Eds.; PMLR: Cambridge, MA, USA, 2022; Volume 172, pp. 653–664. [Google Scholar]
- Mansour, Y.; Heckel, R. Zero-Shot Noise2Noise: Efficient Image Denoising Without Any Data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 14018–14027. [Google Scholar]
- Krull, A.; Buchholz, T.O.; Jug, F. Noise2Void - Learning Denoising From Single Noisy Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Quan, Y.; Chen, M.; Pang, T.; Ji, H. Self2Self With Dropout: Learning Self-Supervised Denoising From Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Kendall, A.; Gal, Y. What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: San Jose, CA, USA, 2017; Volume 30. [Google Scholar]
- Sucar, L.E. Probabilistic Graphical Models; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Wilson, A.G.; Izmailov, P.; Hoffman, M.D.; Gal, Y.; Li, Y.; Pradier, M.F.; Vikram, S.; Foong, A.; Lotfi, S.; Farquhar, S. Evaluating Approximate Inference in Bayesian Deep Learning. In NeurIPS 2021 Competitions and Demonstrations Track; Kiela, D., Ciccone, M., Caputo, B., Eds.; PMLR: Cambridge, MA, USA, 2022; Volume 176, pp. 113–124. [Google Scholar]
- Zhang, J. Modern Monte Carlo methods for efficient uncertainty quantification and propagation: A survey. WIREs Comput. Stat. 2021, 13, e1539. [Google Scholar] [CrossRef]
- Abdar, M.; Pourpanah, F.; Hussain, S.; Rezazadegan, D.; Liu, L.; Ghavamzadeh, M.; Fieguth, P.; Cao, X.; Khosravi, A.; Acharya, U.R.; et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf. Fusion 2021, 76, 243–297. [Google Scholar] [CrossRef]
- Tomani, C.; Cremers, D.; Buettner, F. Parameterized Temperature Scaling for Boosting the Expressive Power in Post-Hoc Uncertainty Calibration. In Computer Vision—ECCV 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer: Cham, Switzerland, 2022; pp. 555–569. [Google Scholar] [CrossRef]
- Li, Y.; Sur, P. Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling. arXiv 2025, arXiv:2502.15131. [Google Scholar] [CrossRef]
- Zhang, C.B.; Jiang, P.T.; Hou, Q.; Wei, Y.; Han, Q.; Li, Z.; Cheng, M.M. Delving Deep Into Label Smoothing. IEEE Trans. Image Process. 2021, 30, 5984–5996. [Google Scholar] [CrossRef] [PubMed]
- Tao, J.; Li, Q.; Zhu, C.; Li, J. A hierarchical naive abas network classifier embedded GMM for textural image. Int. J. Appl. Earth Obs. Geoinf. 2012, 14, 139–148. [Google Scholar] [CrossRef]
- Jiang, Z.; Zheng, Y.; Tan, H.; Tang, B.; Zhou, H. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. arXiv 2017, arXiv:1611.05148. [Google Scholar] [CrossRef]
- Guo, X.; Gao, L.; Liu, X.; Yin, J. Improved deep embedded clustering with local structure preservation. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; Volume 17, pp. 1753–1759. [Google Scholar]
- Wang, P.; Ding, C.; Tan, W.; Gong, M.; Jia, K.; Tao, D. Uncertainty-Aware Clustering for Unsupervised Domain Adaptive Object Re-Identification. IEEE Trans. Multimed. 2023, 25, 2624–2635. [Google Scholar] [CrossRef]
- Kontolati, K.; Loukrezis, D.; Giovanis, D.G.; Vandanapu, L.; Shields, M.D. A survey of unsupervised learning methods for high-dimensional uncertainty quantification in black-box-type problems. J. Comput. Phys. 2022, 464, 111313. [Google Scholar] [CrossRef]
- Karthik, E.N.; Cheriet, F.; Laporte, C. Uncertainty Estimation in Unsupervised MR-CT Synthesis of Scoliotic Spines. IEEE Open J. Eng. Med. Biol. 2024, 5, 421–427. [Google Scholar] [CrossRef]
- Dbouk, T.; Drikakis, D. On coughing and airborne droplet transmission to humans. Phys. Fluids 2020, 32, 053310. [Google Scholar] [CrossRef]
- Mousavi, Z.; Yousefi Rezaii, T.; Sheykhivand, S.; Farzamnia, A.; Razavi, S. Deep convolutional neural network for classification of sleep stages from single-channel EEG signals. J. Neurosci. Methods 2019, 324, 108312. [Google Scholar] [CrossRef]
- Lee, J.; Lee, G. Feature Alignment by Uncertainty and Self-Training for Source-Free Unsupervised Domain Adaptation. Neural Netw. 2023, 161, 682–692. [Google Scholar] [CrossRef]
- Lee, J.; Lee, G. Unsupervised domain adaptation based on the predictive uncertainty of models. Neurocomputing 2023, 520, 183–193. [Google Scholar] [CrossRef]
- Eltouny, K.; Gomaa, M.; Liang, X. Unsupervised learning methods for data-driven vibration-based structural health monitoring: A review. Sensors 2023, 23, 3290. [Google Scholar] [CrossRef] [PubMed]
- Christakis, N.; Drikakis, D. Unsupervised Learning of Particles Dispersion. Mathematics 2023, 11, 3637. [Google Scholar] [CrossRef]
- Christakis, N.; Drikakis, D.; Ritos, K.; Kokkinakis, I.W. Unsupervised machine learning of virus dispersion indoors. Phys. Fluids 2024, 36, 013320. [Google Scholar] [CrossRef]
- Christakis, N.; Drikakis, D. On particle dispersion statistics using unsupervised learning and Gaussian mixture models. Phys. Fluids 2024, 36, 093317. [Google Scholar] [CrossRef]
- Christakis, N.; Drikakis, D.; Kokkinakis, I.W. Advancing understanding of indoor conditions using artificial intelligence methods. Phys. Fluids 2025, 37, 015160. [Google Scholar] [CrossRef]
- Panić, B.; Klemenc, J.; Nagode, M. Gaussian Mixture Model Based Classification Revisited: Application to the Bearing Fault Classification. J. Mech. Eng. Vestn. 2020, 66, 215–226. [Google Scholar] [CrossRef]
- Jebarani, P.E.; Umadevi, N.; Dang, H.; Pomplun, M. A novel hybrid K-means and GMM machine learning model for breast cancer detection. IEEE Access 2021, 9, 146153–146162. [Google Scholar] [CrossRef]
- Hajihosseinlou, M.; Maghsoudi, A.; Ghezelbash, R. A comprehensive evaluation of OPTICS, GMM and K-means clustering methodologies for geochemical anomaly detection connected with sample catchment basins. Geochemistry 2024, 84, 126094. [Google Scholar] [CrossRef]
- Mateo-Gabín, A.; Tlales, K.; Valero, E.; Ferrer, E.; Rubio, G. An unsupervised machine-learning-based shock sensor: Application to high-order supersonic flow solvers. Expert Syst. Appl. 2025, 270, 126352. [Google Scholar] [CrossRef]
- Chan, C.; Feng, F.; Ottinger, J.; Foster, D.; West, M.; Kepler, T.B. Statistical mixture modeling for cell subtype identification in flow cytometry. Cytom. Part J. Int. Soc. Anal. Cytol. 2008, 73, 693–701. [Google Scholar] [CrossRef] [PubMed]
- Khansari-Zadeh, S.M.; Billard, A. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Trans. Robot. 2011, 27, 943–957. [Google Scholar] [CrossRef]
- Wiest, J.; Höffken, M.; Kreßel, U.; Dietmayer, K. Probabilistic trajectory prediction with Gaussian mixture models. In Proceedings of the 2012 IEEE Intelligent Vehicles Symposium, Alcala de Henares, Spain, 3–7 June 2012; pp. 141–146. [Google Scholar]
- Varolgüneş, Y.B.; Bereau, T.; Rudzinski, J.F. Interpretable embeddings from molecular simulations using Gaussian mixture variational autoencoders. Mach. Learn. Sci. Technol. 2020, 1, 015012. [Google Scholar] [CrossRef]
- Bitaab, M.; Hashemi, S. Hybrid intrusion detection: Combining decision tree and gaussian mixture model. In Proceedings of the 2017 14th International ISC (Iranian Society of Cryptology) Conference on Information Security and Cryptology (ISCISC), Shiraz, Iran, 6–7 September 2017; pp. 8–12. [Google Scholar]
- Wan, H.; Wang, H.; Scotney, B.; Liu, J. A novel Gaussian mixture model for classification. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, 6–9 October 2019; pp. 3298–3303. [Google Scholar]
- Scrucca, L.; Fop, M.; Murphy, T.B.; Raftery, A.E. mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 2016, 8, 289. [Google Scholar] [CrossRef]
- Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
- Reynolds, D.A. A Gaussian Mixture Modeling Approach to Text-Independent Speaker Identification; Georgia Institute of Technology: Atlanta, GA, USA, 1992. [Google Scholar]
- Chen, T.; Morris, J.; Martin, E. Probability density estimation via an infinite Gaussian mixture model: Application to statistical process monitoring. J. R. Stat. Soc. Ser. Appl. Stat. 2006, 55, 699–715. [Google Scholar] [CrossRef]
- Davari, A.; Özkan, H.C.; Maier, A.; Riess, C. Fast and Efficient Limited Data Hyperspectral Remote Sensing Image Classification via GMM-Based Synthetic Samples. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2107–2120. [Google Scholar] [CrossRef]
- Barton Tomas. Clustering Benhmarks. 2023. Available online: https://github.com/deric/clustering-benchmark (accessed on 18 February 2025).
- Aeberhard, S. Wine Data Set. UCI Machine Learning Repository. 1991. Available online: https://archive.ics.uci.edu/dataset/109/wine (accessed on 14 November 2025).
- Amini, H.; Lee, W.; Di Carlo, D. Inertial microfluidic physics. Lab Chip 2014, 14, 2739–2761. [Google Scholar] [CrossRef]
- Zelnik-manor, L.; Perona, P. Self-Tuning Spectral Clustering. In Advances in Neural Information Processing Systems; Saul, L., Weiss, Y., Bottou, L., Eds.; MIT Press: Cambridge, MA, USA, 2004; Volume 17. [Google Scholar]
- Ritos, K.; Drikakis, D.; Kokkinakis, I.W. Virus spreading in cruiser cabin. Phys. Fluids 2023, 35, 103329. [Google Scholar] [CrossRef]
- Tatem, A.J. Mapping population and pathogen movements. Int. Health 2014, 6, 5–11. [Google Scholar] [CrossRef]
- Anderson, E.L.; Turnham, P.; Griffin, J.R.; Clarke, C.C. Consideration of the aerosol transmission for COVID-19 and public health. Risk Anal. 2020, 40, 902–907. [Google Scholar] [CrossRef]
- Jarvis, M.C. Aerosol transmission of SARS-CoV-2: Physical principles and implications. Front. Public Health 2020, 8, 590041. [Google Scholar] [CrossRef]
- Microsoft. Bing Image Creator. 2025. Available online: https://www.bing.com/create (accessed on 21 February 2025).
- Loth, E. Numerical approaches for motion of dispersed particles, droplets and bubbles. Prog. Energy Combust. Sci. 2000, 26, 161–223. [Google Scholar] [CrossRef]
- Forma, M.; Leardi, R.; Armanino, C.; Lanteri, S.; Conti, P.; Princi, P. PARVUS—An Extendible Package for Data Exploration, Classification and Correlation; Elsevier Scientific Software: Amsterdam, The Netherlands, 1988. [Google Scholar]
- Aeberhard, S.; Coomans, D.; Vel, O.D. Improvements to the classification performance of RDA. J. Chemom. 1993, 7, 99–115. [Google Scholar] [CrossRef]












| Method | Deterministic Component | Probabilistic Component | Uncertainty Quantification | Noise Handling |
|---|---|---|---|---|
| K-means + GMM [79] | Yes | Yes | No | Limited |
| Bayesian GMM [63] | No | Yes | Yes | Moderate |
| Spectral + GMM [92] | Yes | Yes | No | Moderate |
| Deep Clustering [46] | Yes | Partial | No | Good |
| VADE [64] | Yes | Yes | Partial | Good |
| SUN (Proposed) | Yes | Yes | Yes | Excellent |
| Cluster | True Count | RUN-ICON Count | SUN (RUN-ICON + GMM) Count |
|---|---|---|---|
| 1 | 102 | 148 | 102 |
| 2 | 106 | 129 | 106 |
| 3 | 95 | 26 | 95 |
| Cluster | True Count | RUN-ICON Count | SUN (RUN-ICON + GMM) Count |
|---|---|---|---|
| “head” | 3805 | 1729 | 2095 |
| left “ear” | 1468 | 2462 | 2279 |
| right “ear” | 1375 | 2457 | 2274 |
| Cluster | True Count | RUN-ICON Count | SUN (RUN-ICON + GMM) Count |
|---|---|---|---|
| 1 | 59 | 60 | 63 |
| 2 | 48 | 55 | 50 |
| 3 | 71 | 63 | 65 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Christakis, N.; Drikakis, D. SUN: Stochastic UNsupervised Learning for Data Noise and Uncertainty Reduction. Appl. Sci. 2025, 15, 12954. https://doi.org/10.3390/app152412954
Christakis N, Drikakis D. SUN: Stochastic UNsupervised Learning for Data Noise and Uncertainty Reduction. Applied Sciences. 2025; 15(24):12954. https://doi.org/10.3390/app152412954
Chicago/Turabian StyleChristakis, Nicholas, and Dimitris Drikakis. 2025. "SUN: Stochastic UNsupervised Learning for Data Noise and Uncertainty Reduction" Applied Sciences 15, no. 24: 12954. https://doi.org/10.3390/app152412954
APA StyleChristakis, N., & Drikakis, D. (2025). SUN: Stochastic UNsupervised Learning for Data Noise and Uncertainty Reduction. Applied Sciences, 15(24), 12954. https://doi.org/10.3390/app152412954

