Adaptive Sparse Clustering of Mixed Data Using Azzalini-Encoded Ordinal Variables

Ismail Arjdal; Mohamed Alahiane; Echarif Elharfaoui; Mustapha Rachdi

doi:10.3390/axioms14120902

,

and

¹

Department of Mathematics, Faculty of Sciences, Chouaib Doukkali University, El Jadida 24000, Morocco

²

LERSEM Laboratory, National School of Business and Management, Chouaib Doukkali University, Corner Avenue Ahmed Chaouki and Rue de Fès, BP.122, El Jadida 24000, Morocco

³

AGEIS Laboratory, Grenoble Alpes University (UGA), UFR SHS, BP.47, CEDEX 09, 38040 Grenoble, France

^*

Author to whom correspondence should be addressed.

Axioms2025, 14(12), 902;https://doi.org/10.3390/axioms14120902
(registering DOI)

This article belongs to the Special Issue Stochastic Modeling and Optimization Techniques

Version Notes

Order Reprints

Abstract

In this paper, we propose a novel sparse clustering method designed for high-dimensional mixed-type data, integrating Azzalini’s score-based encoding for ordinal variables. Our approach aims to retain the inherent nature of each variable type—continuous, ordinal, and nominal—while enhancing clustering quality and interpretability. To this end, we extend classical distance metrics and adapt the Davies–Bouldin Index (DBI) to better reflect the structure of mixed data. We also introduce a weighted formulation that accounts for the distinct contributions of variable types in the clustering process. Empirical results on simulated and real-world datasets demonstrate that our method consistently achieves better separation and coherence of clusters compared to traditional techniques, while effectively identifying the most informative variables. This work opens promising directions for clustering in complex, high-dimensional settings such as marketing analytics and customer segmentation.

Keywords:

sparse clustering; mixed-type data; Azzalini’s score; ordinal variables; nominal variables; high-dimensional data; Davies–Bouldin index; variable selection

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.