Preference for Number of Friends in Online Social Networks

: Preferences or dislikes for speciﬁc numbers are ubiquitous in human society. In traditional Chinese culture, people show special preference for some numbers, such as 6, 8, 10, 100, 200, etc. By analyzing the data of 6.8 million users of Sina Weibo, one of the largest online social media platforms in China, we discover that users exhibit a distinct preference for the number 200, i.e., a signiﬁcant fraction of users prefer to follow 200 friends. This number, which is very close to the Dunbar number that predicts the cognitive limit on the number of stable social relationships, motivates us to investigate how the preference for numbers in traditional Chinese culture is reﬂected on social media. We systematically portray users who prefer 200 friends and analyze their several important social features, including activity, popularity, attention tendency, regional distribution, economic level, and education level. We ﬁnd that the activity and popularity of users with the preference for the number 200 are relatively lower than others. They are more inclined to follow popular users, and their social portraits change relatively slowly. Besides, users who have a stronger preference for the number 200 are more likely to be located in regions with underdeveloped economies and education. That indicates users with the preference for the number 200 are likely to be vulnerable groups in society and are easily affected by opinion leaders.


Introduction
Numbers play a prominent role in the daily life of human beings. Just as the motto of the Pythagorean school goes, all is number [1][2][3]. The Pythagoreans used numbers to describe the world, attempted to capture the essence of all existing things, and associated social meanings with the numbers. For example, they believed that odd numbers were masculine, even numbers were feminine [4], the number one stood for the origin of everything, and the number ten was the number of the universe [5]. This kind of mystical worship for numbers became Pythagorean numerology, which affected many aspects of people's life in turn. Up to the present, similar numerology or number preference still widely exists in various cultures around the world [6][7][8][9]. For example, the number eight represents good fortune and wealth, while the number four represents bad fortune in China [10][11][12]. Similarly, in a few Christian countries, the number thirteen is seen as unlucky [13,14]. However, in Italy, thirteen is considered a lucky number. In modern society, the preference for numbers has a profound influence on people's decision making, and it has been extensively studied in the field of consumer research [15][16][17][18][19][20][21][22][23][24].
With the rapid development of the Internet and the increasing popularity of intelligent terminals, global online social media represented by Facebook, Twitter, WeChat, Weibo, and TikTok have formed a new trans-national, trans-ethnic, and trans-cultural online community with more than billions of people [25][26][27][28][29]. These social media platforms play an indispensable role in daily life. A large amount of data on user interaction behavior provides a solid foundation for studying the online presentation of human behavior. Based on the data generated by social media, we can infer users' characteristics [30,31] and design a more accurate recommendation system [32]. In addition, users' social profiles can reflect their personality [33][34][35] and their preferences for different ways of information consumption [36]. In terms of specific applications, the data of user interaction on social media can be applied for event prediction [37,38], robot detection [39,40], and fake news detection [41]. Further, social media also provides us a large-scale data of network topology and information transmission trajectory. Thus, we can analyze the evolution of users' attention [42] and discourse power in the network [43].
In this present research, we focus on the preference for numbers in the online social network Sina Weibo, especially the choice for the number of friends. We find that the distribution of users' friend number has a very significant peak near 200. Weibo's network structure formation mechanism is the same as that of Twitter. Users are free to decide who and how many to follow (i.e., friends). However, they cannot determine who follows themselves (i.e., followers) and the corresponding number. In this system, the numbers of users' followers are usually heterogeneous [44,45]. In other words, most users often have a quite limited number of followers, while a minority of users have a large number of followers, such as social celebrities and entertainment stars. However, due to the limited time and attention of the users, they can only maintain effective social relationships with a certain number of friends. Therefore, the numbers of effective friends of the users are generally homogeneous [46], such as Dunbar number (150) [47]. In this paper, we mainly study users in Sina Weibo who have a particular preference for 200 friends and investigate their important social properties, such as activity, popularity, attention tendency, portraits evolution, regional difference, economic, and educational development level.

Related Work
Number preference is a phenomenon that exists in different countries, fields, and cultures. In [10], the authors studied price ending strategies in Asia and found that people prefer prices ending with eight in Asian cultures for four main reasons: luck, market norm, appeal, and value image. In [11], the authors analyzed superstitions and price clustering in the Taiwan Stock Exchange. Results showed the prevalence of price concentrates in the Chinese lucky number group (three, six, eight, and nine) rather than the unlucky number group (4 and 7). In [48], the authors analyzed companies listed on the Taiwan stock market. Results revealed that firms could gain return premiums on the stock market if their listing codes include lucky numbers. The authors in [12] studied the value of superstition in the number eight by using car plate auction data in Malaysia. They found that the number eight is considered lucky by ethnic Chinese people. The authors in [23] compared the effects of integers and non-integers on consumers' evaluation and judgment of related target entities. Results revealed using non-round numbers would increase attention to numerical values. In [49], the authors used Nigerian lottery games to analyze players' number preferences. They found that the two most popular numbers are one and nine, and players tried to avoid multiples of seven and numbers containing seven. The authors in [24] explored the degree to which voters can translate their ideological predispositions into numerical policy preferences and found that their ideological differences are reflected in numerical preferences.
With the rise of social media, people have become more and more involved in their current online life. A large amount of user behavior data has facilitated research in related fields. Based on social media user attributes, researchers focus on predicting user's personalities and analyzing their online behavior characteristics. In [33], the authors proposed a machine learning approach to predict the user's personality based on their social profile. The authors in [36] studied users' attention economy in news consumption by analyzing users' portraits. The authors in [30] identified the similarities and differences in personality characteristics of Internet and social media addiction profiles. In [34], the authors found extraversion is highly related to Facebook usage. Similarly, the authors in [35] found a positive relation between Big Five personality traits of user activity. In [42], the authors examined the effects of follower number and comment tone on judgments about opinion leaders. There are also some specific applications based on analyzing user attributes, such as optimizing recommendation system [32], modeling information dissemination [43,50], event detection [37,51], and bot detection [39,40]. However, for users with specific personalities offline, there are few studies about how their online portraits would be. As described before, most studies about number preferences were conducted in a certain field offline. To the best of our knowledge, nobody has verified whether this phenomenon exists on social media. Moreover, if it exists, what numbers will it be, and how would the attributes of users with these numbers preferences (e.g., social media usage, attention to opinion leaders, especially social activity, and extroversion) be? This paper will bridge the gap between research on number preferences and how it presents on social media.

Datasets
As one of the most popular online social media platforms in China, Sina Weibo [52] plays an important role in spreading popular topics, entertainment gossip, and celebrity anecdotes. The collective behavior of users engaging in Sina Weibo makes it possible to study how traditional Chinese culture is reflected on social media. We collected users' attribute information at three different times (see Table 1) on Weibo, and included users' nickname, number of friends, number of followers, number of posts, gender, and city. In addition, we also collected data from China's sixth census (conducted in 2011), including the level of economic development, population distribution, gender composition, age composition, urbanization, and education level of different regions.

Methods
Based on the data collected in 2012, Weibo users exhibit distinct preferences for the number 200 from both the national and regional levels. To verify whether this phenomenon is stable and long-standing, we randomly selected 68,655 users whose number of friends was less than 200 in 2012 and then analyzed their distribution of the number of friends in 2021. Meanwhile, to justify this phenomenon as normal human behaviors rather than caused by robots, we randomly selected 6620 users whose number of friends is near 200 and then analyzed their posts type (i.e., tweet or retweet). After confirming this phenomenon is stable and has research significance, we systematically compare the characteristics of users with and without 200 preferences from the static, dynamic, and regional levels. The following are the definitions of variables used in this paper, corresponding measurement and combination of using data: Variables description and definition. Here, we describe the variables used in this paper. The number of friends (i.e., k i ) of a user is how many users he/she follows. The number of followers (i.e., k o ) of a user is how many users follow him/her. The number of posts (i.e., m) is how many posts he/she has created since registration. The post content is the content generated by users. It may contain hyperlinks, hashtags, or mentions. According to whether it is an original post or not, the post content could be classified into two categories: tweet (post) or retweet (repost). The proportion of Original Posts is the percentage of tweets a user has tweeted since registration. In Section 4.2, we use m and k o to represent the user's Activity and Popularity, respectively. The Follower Count of Friends is the number of followers a user's friends have. Based on the friends list, we crawled the number of followers of each of his/her friends. It directly reflects the user's Attention Tendency, especially the user's preference on the popularity of their friends.
Measuring portraits evolution. In Section 4.3, we explore the evolution of users' portraits from 2012 to 2018. For each user, we define δ, the fluctuation of number of friends, followers or posts of him/her as: where X(t) represents the number of friends, followers, or posts in year t. Then, we calculate the fraction of δ ≤ 5% of two user groups (with and without the 200 preference). Similarly, we calculate the fraction of users whose locations remain unchanged.
Preference Intensity. In Section 4.4, we study the preference intensity for number 200 in different regions. We define the preference intensity of number 200 as: where and Here p k i denotes the proportion of users whose number of friends is k i , 3, and 14 denote the number of different k i in the user groups with and without the preference for the number 200, respectively.
Linking user portraits to regional GDP and Education. In Section 4.5, we study the correlations between preference intensity and regional characteristics (i.e., GDP and education). The location information (i.e., city) in user portraits can be used to identify which province he/she belongs to. The census data records each province's information about GDP, education, gender/age composition, etc. We link province information to census data and get the corresponding region's GDP and education information.

Results
We crawled 6,836,935 users' attribute information of Sina Weibo using a random walk sampling method [53] in 2012. When calculating the distribution of the number of friends, we observe that there exists a very sharp peak at 200. In Figure 1a-j, we demonstrate this phenomenon in different provinces in detail. Meanwhile, Figure 1l indicates that there is a peak between 200 to 202. This phenomenon reveals users' preference for the number 200.
To better analyze the statistical characteristics of the users with the preference of the number 200, we investigate the differences among various aspects of three user groups based on their number of friends: (i) Group 1, whose number of friends is 200, 201, or 202, which includes 48,960 users; (ii) Group 2, whose number of friends is between 193 and 199, which consists of 88,573 users; (iii) Group 3, whose number of friends is between 203 and 209, which includes 76,954 users. The number of friends for these three user groups is very close, which can exclude other factors' impact on subsequent analysis as much as possible and reflect the characteristics of users with the preference for the number 200 better.

Evolution of the Number of Friends
In order to study the evolution of the distribution of the number of friends, we randomly selected 22,604, 22,936, and 23,115 users whose number of friends in 2012 was in the range between 120 and 130, 145 and 155, and 170 and 180, respectively, and then crawled their attribute information in 2021. In Figure 2a, the histogram and the corresponding solid line represent the distribution of the number of friends for these three user groups in 2012 and 2021, respectively. We can observe that the distribution of the number of friends in 2021 also has a peak at 200. Meanwhile, we used resampling methods (i.e., bootstrap and jackknife) [54] to estimate the bias. Here we used uniform sampling with a replacement method to investigate the effects of sampling bias. From Figure 2a, we can see that the discrepancies between resampled dataset and the original dataset are minimal in all corresponding solid lines. These results suggest that the sampling bias is not significant for verifying this is a stable phenomenon. That is to say, when a user's number of friends evolved up to 200, the evolution speed will slow down. In other words, the number of friends of users will stay at 200 for a relatively long time. This phenomenon reflects from another aspect that the preference for 200 friends is not robots' behavior.
Moreover, we randomly selected 1302 users with the preference of the number 200 and 5318 users without the preference of the number 200 in 2012. In 2018, we crawled all posts of these users published since registration and counted the proportions of the original posts for each user. Figure 2b shows the distribution of proportions of original posts, and we can see that the two distributions are almost the same. Specifically, there are 46.01% users with the preference for the number 200 whose proportion of the original post is more than half, which is similar with users without the preference for the number 200 (43.68%). That indicates the posting behavior of users with the preference for the number 200 is consistent with that of normal users. This further illustrates that the preference for number 200 is not robots' behavior.

User Activity, Popularity, and Attention Tendency
We start by studying the activity, popularity, and attention tendency of users whose number of friends is 200, 201, and 202 (with the preference for the number 200). From Figure 3a,b, we can see that users with the preference for the number 200 have fewer posts and followers than users without the preference for the number 200, which means the former users are less active and popular. For instance, for the users with (without) the preference for the number 200, 33.12% (9.96%) of users posted less than 10 posts, and 26.11% (8.15%) of users have less than 20 followers. Moreover, the behavior of the two groups of users without the preference for the number 200 is almost the same.
Further, we analyzed the attention tendency (See Methods) of these users. Figure 3c shows users with a preference for the number 200 are, relatively, more inclined to follow popular users compared with users without the preference for the number 200. For instance, there are 43.42% of users with a preference for the number 200 would follow popular users (number of followers ≥ 100,000), while only 26.51% of them would follow normal users (number of followers ≤ 500). However, among users without the preference for the number 200, the corresponding percentages are 36.13% and 32.75%, respectively.

User Portrait Evolution
In order to further investigate the activity of users with the preference for the number 200, we next focus on the evolution of user's portraits over time. In 2018, we have also collected the attribute information of users whose number of friends in 2012 was between 193 and 209. We obtained valid data consisting of 43,198 users with the preference for the number 200 and 146,404 users without the preference for the number 200.
For users whose number of friends was k in 2012, we count the proportion of the users whose number of friends, followers, and total posts are fluctuated within 5% by 2018, i.e., the corresponding number changes were less than 5% compared to 2012 (See Methods). From Figure 4a-c, we can see that, compared to users without the preference for the number 200, the changes of the number of friends, followers, and total posts are smaller in users with the preference for the number 200. For instance, in Figure 4a,c, only about 10% of users without the preference for the number 200 change less than 5%, while the corresponding proportions exceed 40% in users with the preference for the number 200. Similarly, we count the proportion of the users whose geographical position remains unchanged from 2012 to 2018, as Figure 4d shows. Although both these two kinds of users rarely changed during this period, the change of those users with the preference for the number 200 is even less.

Regional Difference
Next, we discuss users' preference intensity of the number 200 in different regions. As defined in Section 3.2, PI will be approximately equal to 1 if all users have no preference for the number 200, because there will be no peak at 200 in this case. Additionally, a higher value indicates a stronger preference. Figure 5a shows that the preference for the number 200 intensity in northwest China is much stronger than the one in southeast China, whose degree of difference can be separated well by Hu Line [55], a line dividing the population, urbanization level, and culture transformation. Moreover, the preference intensity of number 200 in Taiwan  Further, we rank 34 regions' preference intensity of number 200, as Figure 5b shows. We can see that the preference intensity in the northwest region (e.g., Tibet, Ningxia, Qinghai, Xinjiang, Gansu, etc.) is significantly higher than the national average, while the southeast region (e.g., Jiangsu, Zhejiang, Shanghai, Fujian, Guangdong, etc.) is far below the national average. For instance, Tibet and Ningxia are 47% and 33% higher than the national average, respectively, while Shanghai and Fujian are below 28% and 23%, respectively. Each point represents a province, and the black line represents the fitted line of the least squares method with a confidence of 95%, R 2 are 0.56 and 0.32, respectively. From this figure, we can see that there exist obvious differences in regions for preference intensity, the northwestern region is relatively strong, while the southeastern coastal region is weak. Moreover, there is a significant negative correlation between preference intensity and the level of regional economic/educational development. Figure 5a shows the 200 preference intensity roughly decreases from northwest to southeast China, which is consistent with the regional economic and educational level. That prompted us to further study their correlations. We collected relevant data from the official website of the National Bureau of Statistics [56], including the regional GDP of 31 provinces/municipalities in mainland China (without the data of Hong Kong, Macao, and Taiwan) and the average years of education in each province in the sixth population census in 2011. Figure 5c,d exhibit the negative correlation between the economic level, the educational level, and the preference intensity of the number 200, which implies regions where the economic and educational levels are more undeveloped, the preference intensity for number 200 is more strong.

Conclusions
In this work, we studied 6.8 million Sina Weibo users systematically and found that a significant fraction of users have a special preference for number 200, i.e., they prefer to follow 200 friends. We comparatively portrayed users with the preference for the number 200 from several different perspectives, such as user activity, popularity, attention tendency, regional distribution, economic, and education level. The results show that users with a 200 preference have lower social activity and fewer followers compared with users without the preference for the number 200. We found that they are more inclined to pay attention to users with higher popularity, and their social portraits change relatively slowly. Moreover, we also found the preference intensity of number 200 is stronger in regions with relatively undeveloped economic and educational levels. Further analysis revealed that there exists a significant negative correlation between the preference intensity and the development of regions. That indicates users with the preference for the number 200 are likely to be vulnerable groups in society, and influenced by opinion leaders in the network.
We argue that people have a special preference for 200 friends for a few reasons. On the one hand, in traditional Chinese culture, people have a special preference for some numbers, such as 6, 8, 10, 100, 200, etc. On the other hand, due to the limits of brain capacity, people can only maintain effective social relationships with a certain number of friends. However, online social networks make information exchange among people more convenient than ever, which is the most likely reason why the number of friends is a little higher than the Dunbar number (150) [47] in online social networks.

Discussion
This paper systematically studied the phenomenon of number preferences on one of China's most popular online social media. Nevertheless, two questions need to be seriously considered when evaluating its universality around the world. Firstly, whether there is a similar phenomenon of number preference on other social networks. Secondly, if so, whether the exact number is still 200 or another number that depends on its cultural background. Yet, it is difficult to verify the above two questions. Affected by the Facebook-Cambridge Analytica data breach [57,58], the world's mainstream social media, such as Twitter, Facebook, and WeChat have strengthened user privacy protection through the restricted APIs and anti-crawler mechanism [59]. It is now almost impossible to collect large amounts of user data in a short time. Besides, collecting and using data correctly is another issue that cannot be ignored [60].
However, verifying its universality will be still possible in the future. On the one hand, suppose other social media re-open their previous APIs to access more user data, or researchers collaborate with the social media to get the required data. In that case, it will be possible to examine this phenomenon on different social network platforms in China and beyond. On the other hand, large-scale offline experimental investigations which take more time and effort might also help validate the above findings.  Data Availability Statement: Data associated with this study will be available on the website of corresponding author after publication: www.huyanqing.com (accessed on 1 January 2018).

Conflicts of Interest:
The authors declare no conflict of interest.