The text data of the social network platforms take the form of short texts, and the massive text data have high-dimensional and sparse characteristics, which does not make the traditional clustering algorithm perform well. In this paper, a new community detection method based on the sparse subspace clustering (SSC) algorithm is proposed to deal with the problem of sparsity and the high-dimensional characteristic of short texts in online social networks. The main ideal is as follows. First, the structured data including users’ attributions and user behavior and unstructured data such as user reviews are used to construct the vector space for the network. And the similarity of the feature words is calculated by the location relation of the feature words in the synonym word forest. Then, the dimensions of data are deduced based on the principal component analysis in order to improve the clustering accuracy. Further, a new community detection method of social network members based on the SSC is proposed. Finally, experiments on several data sets are performed and compared with the K-means clustering algorithm. Experimental results show that proper dimension reduction for high dimensional data can improve the clustering accuracy and efficiency of the SSC approach. The proposed method can achieve suitable community partition effect on online social network data sets.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited