Abstract
In this paper, we propose a novel sparse clustering method designed for high-dimensional mixed-type data, integrating Azzalini’s score-based encoding for ordinal variables. Our approach aims to retain the inherent nature of each variable type—continuous, ordinal, and nominal—while enhancing clustering quality and interpretability. To this end, we extend classical distance metrics and adapt the Davies–Bouldin Index (DBI) to better reflect the structure of mixed data. We also introduce a weighted formulation that accounts for the distinct contributions of variable types in the clustering process. Empirical results on simulated and real-world datasets demonstrate that our method consistently achieves better separation and coherence of clusters compared to traditional techniques, while effectively identifying the most informative variables. This work opens promising directions for clustering in complex, high-dimensional settings such as marketing analytics and customer segmentation.