Real stressed speech is affected by various aspects (individual characteristics and environment) so that the stress patterns are diverse and different on each individual. To this end, in our previous work, we performed an unsupervised clustering method that able to self-learning manner by mapping the feature representations of the stress speech and clustering tasks simultaneously, called deep time-delay embedded clustering (DTEC). However, DTEC has not confirmed yet the compatibility between the output class and informational classes. Therefore, we proposed semi-supervised time-delay embedded clustering (SDTEC) as a new framework of semi-supervised in DTEC. SDTEC incorporates the prior information of pairwise constraints in the embedding layer and simultaneously learns the feature representation and the clustering assignments. The prior information was used to guide the clustering procedure so that the points that belong to the incorrect cluster can be corrected. The effectiveness of the proposed SDTEC was evaluated by comparing it with some baseline methods in terms of the clustering error rate (CER). Moreover, to demonstrate SDTEC’s capabilities, we conducted a comprehensive ablation study. Based on experiment results, SDTEC outperformed the baseline methods and achieves state-of-the-art results in semi-supervised clustering.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited