Tagging Users’ Social Circles via Multiple Linear Regression

: A social circle is a category of strong social relationships, such as families, classmates and good friends and so on. The information diffusion among members of online social circles is frequent and credible. The research of users’ online social circles has become popular in recent years. Many scholars propose methods for detecting users’ online social circles. On the other hand, the social meanings and the tags of a social circle are also important for the analysis of a social circle. However, little work involves the tags discovery of social circles. This paper proposes an algorithm for social circle tag detection by multiple linear regression. The model solves the data sparse problem of tags in social circles and successfully combines different categories of features in social circles. We also redmap the concept of the social circle into "reference circles" of an academic paper. We evaluate our method in datasets of both Facebook and Microsoft Academic Search, and prove that it is more effective than other relevant methods.


Introduction
Social media is a popular communication platform.Compared with other information networks, user relationships promote more effective dissemination of information on social networks.There are different functional social medias, such as online academic networks, recommendation networks and social network services (SNS) and so on.In these networks, users communicate with their friends and share information about their similar interests.Users usually have strong relationships with and similar backgrounds to their friends.
A user has many categories of strong relationships in social media.These strong relationships are constituted of users' online social circles.A user has several social circles on Twitter and Facebook, such as classmates in a school, and colleagues in a company and so on (Figure 1).A social circle reflects an individual's social environment that can often be leveraged to infer important information about that individual's attitudes, behaviors, and decisions [1][2][3][4][5][6][7].However, this type of social circle is different from the traditional community.
Perceived in a graph view, the distribution of edges is not only globally, but also locally inhomogeneous, with high concentrations of edges within special groups of vertices, and low concentrations between these groups.This feature of a real network is called community structure [8].A node of a community may refer to people, and it also could be compared to a computer in Internet or a gene in a gene network.Firstly, a traditional community is larger than a general social circle which may just have three members (such as three family members).On the other hand, a residential area can be a community in a mobile communication network.The area may have several thousand people, but it is not anyone's social circle.Secondly, although there are some common tags or profiles of each member in the same community, these members may not know each other.In contrast, there are strong relationships between each member in the same social circle.In social science, weak ties are human relationships (acquaintance, loose friendship, etc.), that are less binding than family and close friendship but might, according to Granovetter [9], yield better access to information and opportunities [10].The tie-strength can measure the quality of a community [11,12].A community may be a weak tie, and sometimes members of a community even have no explicit social relationships.However, social circles are typically strong ties.Namely, a community of mobile network depends on the location, not explicit social relationships.As suggested above, community detection is focused on finding arbitrary highly interconnected subgraphs within larger networks, and social circle detection will instead discover several groups of strong social relationships including one or more specific individuals [13].According to these characteristics, detecting and analyzing user's social circles is valuable for research on social network and user behavior.Although there is a lot of work about community detection, many scholars propose algorithms for detecting and analyzing social circles specially [13][14][15][16][17]. Furthermore, the concept of a circle in academic networks means a group of papers which have strong relationships.A paper may cite other papers with several intentions.Some of the references may be relevant according to the research problems and other references may be relevant according to the methodologies.If a paper cites another paper with similar topics or themes, we can regard these two papers are friends, and they have strong relationships with respect to content (Figure 2).
Every social circle should have its tags which can represent its social meanings.For example, a social circle of families may have some tags about a city or a town (such as Beijing), and a social circle of classmates may have some tags about a university and a major (such as Tsinghua University, Computer Science).So far, there are no public social circle datasets with annotated tags.The ground-truth of a social circle is also difficult to collect for privacy issues.Everyone has her/his own social circles, and can only annotate the members and the social meanings of her/his own social circles.So the scale of social circle datasets is usually small.Therefore, it is not easy to evaluate the performance of a social circle detection algorithm.
Research of online social circles has already been strongly concerned by scholars in recent years.In particular, a lot of relevant work in terms of social circle detection has appeared, and this work distinguishes this area from the traditional community detection.However, detecting and mining of social circles is just a foundation of online social circle research, and little work involves tag detection for ground-truths of social circles.One reason for the lack of research is that the purpose of traditional social tag recommendation is to recommend tags for social resources, not users and social groups generally.On the other hand, the amount of members in social circles is small and most users only have few tag items.Some users even have no tags.So, general topic models are not adapted to tags detection of social circles.Therefore, the existing algorithms are difficult to apply to tag detection for social circles.The above factors make social meanings of social circles difficult to identify.In this paper, we propose an algorithm for detection of social circle tags via multiple linear regression.The algorithm detects profiles which can represent a social circle according to members' information of the social circle.By many features of members' relationships and tags, the model trains parameters according to characteristics of social circle data.Then, it gives a weight to every tag for a social circle.Top ranked tags can be representative tags of the social circle.Considering the network topologies, the important members' tags will have higher weights.This can make up the lack of members' tags in social circles.In the dataset of Facebook, the model can detect social circles' tags precisely.In the dataset of the Microsoft Academic Search, it detects keywords for reference circles and filters other redundant words.In the two datasets, it improves precision by 11% and 23% respectively compared to the relevant methods which are based on text information.
The rest of this paper is organized as follows: in the next section, we describe related work.Then we introduce our methodology of tags detection of social circles in Section 3. We describe datasets and the experiment in Section 4. Finally, we conclude our work and point out avenues for future research.

Social Circle
The social circle is closely relevant to the term community which has two interpretations: one is the geographical notion of community and another one is relational.The second one is mainly concerned with people's relationships, without reference to location [18].In this paper, we mainly consider the online social circles that carry the second meaning and mainly concern people's relationships.The detection of social circles is a new research area which is emerging with the popularity of social media.It is a clustering problem within the ego network.Members within a social circle do not only have dense relationships, but also have some common tags.Generally, a social circle is a group of strong social relationships with a specified social meaning.
A growing number of scholars study their subjects in view of social circles.The recommendation algorithm based on users' circles performs no worse than those based on the full network [19].Question recommendation, question popularity analysis and prediction based on social circles get a better performance [20,21].It is also a main feature of linking users across online social networks [22].
Huberman, B.A. et al. proposed that it is necessary to mine users' real friends [23].The evolution of online social groups is analyzed and predicted by [24].Qu and Liu propose a semi-supervised method to detect social circles in Twitter [25].However, some members in user groups are not users' real friends.A lot of groups just classify different types of followees, many of these followees are not users' bilateral friends and strong relationships.Based on Google+, [26] explores motivations of users creating social circles and sharing information in social circles.The paper observes users' behavior to identify the strategies for improving sharing precision through selective sharing.And it also analyzes the names of social circle, such as family or friends.It proves that members of most social circles usually have strong ties, but it does not explore the specified profiles and tags of social circles.The authors of [27] propose a visualization tool for social interaction, and it also can be used to visualize users' social circles.Since 2012, some specified social circle detection algorithms have been proposed [13][14][15][16][17].These algorithms can detect users' strong relationships in social media.These works of social circle detection are foundations of social circle analysis.

Tag Detection
Reference [28] studies the behavior of tagging in Twitter.Users give tags for their tweets for filtering the topics of tweets.The paper proposed that these tags can indicate the lifetime of topics in Twitter.However, it does not involve user's or social circles' tags.The most straightforward unsupervised method for tag detection is using TF-IDF (the detail is in Section 3.2) [29] to rank candidate tags and selecting the top-M as tags.TF-IDF ranks candidate keywords only according to their tags.This will fail to consider the topological structure among social circle members.To our best knowledge, there is little work about tag detection for social circle.In 2012, Liu proposes frequency-based keyword extraction (FKE) for detect users' tags based on text on social network [30].However, social tags are sparser than text and it is usually difficult to collect users' text.The authors of [16] detect tags of social circles with members' following accounts.The method cannot detect tags of a social circle by members' own characteristics.Moreover, many scholars explore the relationships of scientific topics since a lot of academic data will be released.Topic model is a common technology for the evolution of research themes [31,32] and discovery of high quality papers [33].The work of citation prediction combines features of links and topics [34], this can also improve the precision of topic detection [35,36].Bolelli clusters papers in different topics within sparse citations [37].However, the work is still absent in terms of clustering an academic paper's references.Furthermore, in the situations of data sparsity, these algorithms are difficult to apply to tag detection of ego social circles.

Overview
In social networks, every user usually has many social circles, and every social circle has its social meanings, such as families and colleagues.Predicting social meanings of a social circle is significant for the analysis of social circles in social media.However, it is difficult to identify social meanings of social circles via members' tags immediately.
There are many common tags in most social circles, such as degree type.Some tags only belong to a specified user, such as user ID.These kinds of tags cannot be representative tags and attributes of one social circle.On the other hand, representative tags of a social circle should be owned by most members of the circle.However, the lack of individuals' tags means that this does not always work.
For solving these problems of discovery of meaningful tags, we propose a model of multiple linear regression for detecting tags of social circles by combining features about the topological structure and the members' tags of social circle.We regard tag detection as a problem of tag ranking in a social circle.The model gives every tag a score for a social circle, tags with higher score are more likely tags of social circles.

Features
User relationship is an effective feature in a lot of work for social circle detection [14,15].User relationships can reflect users' importance in a social circle.More authoritative users can contribute more important tags.So we choose relevant features of both users' relationships and users' tags.We use all users' tags and every user's tag can describe her/his profile, such as school and location, and so on (Table 1).A user may have one or several items for every type of tag, and different items usually have different values.last_name, first_name, birthday, name, gender locale, hometown-name, hometown-id, education-school-name, education-school-id education-type, education-year-name, education-year-id, education-concentration-name education-concentration-id, id, location-name, location-id, education-classes-from-name education-classes-from-id, education-classes-with-name, education-classes-with-id education-classes-name education-classes-id, work-position-name work-position-id work-start_date, work-end_date work-employer-name, work-employer-id work-location-name, work-location-id, languages-name, languages-id middle_name, work-projects-name, work-projects-id, education-with-name education-with-id, work-projects-with-name, work-projects-with-id, work-description education-degree-name, education-degree-id, work-projects-start_date, work-with-name work-with-id, work-projects-from-name, work-projects-from-id education-classes-description, work-from-name, work-from-id, political, religion work-projects-end_date, work-projects-description, location (1) Percentage of members who own the tag t is a tag and |MEM t | is the amount of members who own this tag. (2) The members' average centrality who own the tag u is a circle member who owns the tag, InnerDegree(u) is the number of this user's friends in the circle. (3)

The tag's TF (Term Frequency) value in a circle
In our work, we regard the set of all members' tags in a social circle as a document, and every tag in the set as a word of this document.For example, a user has two tags user:id:27 and school:id:10.The two items are words and all members' words constitute the tag document of this social circle.Count(tag) is the amount of a tag item in the circle.

If only one user owns the tag
If only one user owns this tag, Fea 6 is 1, otherwise, this Fea 6 is 0. ( 7)

If only one social circle owns the tag
If only one social circle owns this tag, Fea 7 is 1, otherwise, Fea 7 is 0. ( 8) Prefix of the tag Some tags cannot be tags of social circles since they can only belong to a single user, such as user:id.We filter types of all tags and if a tag might be a social circle's tag, Fea 8 is 1, otherwise, Fea 8 is 0.

Multiple Linear Regression
The algorithm computes a score for every tag in a social circle by multiple linear regression.The model uses all features which are mentioned in the previous section.We set C is a circle, and t is a tag.F(C, t) is computed by Equation ( 6).In the training set, we give tags of every circle high weights, and scores of negative tags as 0. The loss function is Equation (7).We use QR decomposition to find all θs for more numerically stable results.QR decomposition is a method of matrix factorization, it is often used to solve the linear least squares problem.

Dataset
The algorithm is evaluated in both Facebook and academic network.The dataset of Facebook is released in Kaggle [38].It includes 60 users and their social circles.The 60 users annotate their social circles and its members in Facebook.There are 17,115 friends in all social circles, while every user has 19.73 social circles and every social circle has 28.91 friends averagely.The task is detecting tags for these social circles.The dataset includes all users' networks and tags, it also has the ground-truth of every user's memberships of social circles.We annotate ground-truths of social circle tags by members' tag.There are 227 social circles in the train set and 315 social circles in the test set.We take characteristics of users' relationships and tags as features of multiple linear regression.The results of tags can represent social attributes of circles.
The dataset of this academic paper is extracted from Microsoft Academic Search [39].There are 50 papers in dataset.The related work is divided into many sections in these papers.All references of every section are relevant to a research problem or a research methodology.So these sections of references can be regarded as ground-truth of reference-circles.The citations among references can be regarded as social relationships.The task is detecting topics and keywords for these circles.We annotate technology terminology in titles and abstracts of these papers, regarding these terminologies as candidate keywords of every single paper.There are 46 reference circles in the train set and 62 reference circles in the test set.

Baseline
There is no specified algorithm for tag detection of social circles, and topic analysis methods are also difficult to apply to such sparse data of social circle tags.In the respect of tags mining, we choose popular tags, TF-IDF and frequency-based keyword extraction (FKE) as the baselines.FKE measures weights of all members' tags in the social circle.We rank candidate tags by term-frequency and member-frequency (TF-MF).Given T is the set of all members' tags of the circle, the tag t ∈ T. TF t of a tag t represents occurrence times of t in T, and |t| is the length of tag t.We define member-frequency as Equation ( 8) and TF-MF as Equation ( 9).

Result Analysis
A social circle may have several tags that can represent the circles attributes.Therefore, we select the top 10 tags in algorithm results as tags of a social circle.The evaluation metric is precision@10 (The correct tags in top 10). in every social circle (Equations ( 10) and ( 11)).

P_Circle =
Correct Pro f iles in Top 10 10 (10) In both datasets of Facebook and Microsoft Academic Search, our method is better than baselines (Table 2).Our method can extract key tags and keywords of different social circles and different parts of academic related works.The tag detection method from text (FKE) will achieve good results in a large number of posts.However, tag data are sparse, and cannot also consider users' topological structure.The improved precisions are 11% and 23% in two datasets, respectively.That proves that the performances of our work are good in both problems of tag detection for social circles and keyword detection for reference circles.
We run multiple linear regression by every single feature in the Facebook dataset (Figure 3).The results show that TF-IDF is the strongest feature in the tag detection of the social circle.When the model uses only one feature, the trained θ can get the best performance by TF-IDF.On the basis of TF-IDF, multiple linear regression combines it with other features and improves detection precision effectively.At the same time, the model can easily transfer to other similar problems about tag detection for social circles.

Conclusion
In this paper, we propose a tags detection algorithm for social circles by multiple linear regression.The model infers social meanings of social circles by all members' memberships and their tags.Following the detection of social circles, this work can deeply analyze the attributes of users' social circles.At the same time, this paper transfers the concept of the social circle into the network of academic papers.The model can detect keywords of papers' reference circles.It is beneficial for understanding the topics of paper's references more precisely and in a focused way.In the future, we will try to complement users' tags with their friends in the same social circles.We will also analyze author circles in an academic network according to their research area and co-author relationships.

Figure 2 .
Figure 2. The main references of this paper "A Generative Blog Post Retrieval Model that Uses Query Expansion Based on External Collections" are relevant to two themes: Query Modeling and External Expansion.

( 4 )
The tag's IDF (Inverse Document Frequency) value in a circleFea 4 = log Total o f Circles Count o f Circles Having the Tag

Figure 3 .
Figure 3. Precision of Multiple Linear Regression with Every Single Feature.

Table 1 .
Types of tags.

Table 2 .
Result of Circle Tag Detection.