A Privacy Measurement Framework for Multiple Online Social Networks against Social Identity Linkage

: Recently, the number of people who are members of multiple online social networks simultaneously has increased. However, if these people share everything with others, they risk their privacy. Users may be unaware of the privacy risks involved with sharing their sensitive information on a network. Currently, there are many research efforts focused on social identity linkage (SIL) on multiple online social networks for commercial services, which exacerbates privacy issues. Many existing studies consider methods of encrypting or deleting sensitive information without considering if this is unreasonable for social networks. Meanwhile, these studies ignore privacy awareness, which is rudimentary and critical. To enhance privacy awareness, we discuss a user privacy exposure measure for users who are members of multiple online social networks. With this measure, users can be aware of the state of their privacy and their position on a privacy measurement scale. Additionally, we propose a straightforward method through our framework to reduce information loss and foster user privacy awareness by using spurious content for required ﬁelds


Introduction
With the progress of society, the rapid development of the Internet, and increasingly onerous work, increasing numbers of people prefer to communicate on the Internet because it is efficient and inexpensive.As a result, in recent years, Online social networks (OSNs) have expanded tremendously and emerged as an indispensable part of human life.People can chat and share messages, news, pictures, videos, and other resources via OSNs.Moreover, various types of social networks are currently used, each with its own unique features [1,2].People make use of OSNs to various degrees according to their needs.Inevitably, people who use these sites create an online role.Additionally, for various reasons including peer pressure, conformity, lack of privacy awareness, and blind trust in social networking sites and other users, users are encouraged to disclose personally identifiable information (PII).Furthermore, social networking sites encourage users to disclose personal information so that other users can find them more easily, which promotes user stickiness [3].It seems that users blindly trust OSN service providers to handle user data in a fair and conscientious way and to continue to do so in the future.However, the reality is that Uber acknowledged in November 2017 that for more than a year it covered up a hacking attack that stole personal information about more than 57 million customers and drivers; US officials say they'll examine claims that a data analysis firm mishandled Facebook users' information, in order to support Donald Trump's election campaign.Facebook knew of this in the past two years, but no measures were taken; in 2017, Twitter publicly announced that it is abandoning the DNT(Do not track) privacy protection standard; according to a recent study, more than 6.05 billion pieces of personal information have been disclosed in China [4].Thus, privacy protection ultimately depends on the users.
In the field of computer security, the basic principle of protecting privacy is preventing information from escaping its intended boundaries.However, privacy on OSNs is contrary to the goal of people using them.The only way to mitigate this paradox is to find a reasonable boundary between protecting privacy and disclosing PII.However, although people find it inherently easy to understand physical concepts, they have difficulty with virtual concepts, and privacy is a virtual concept.Initially, most OSNs offer their users privacy control, which is simple to use but limited; for example, a privacy control may enable users to set their entire profile as public, visible to friends only, or private (visible only to the user).With growing demand from users and increasing attention to privacy in the media, many OSNs (e.g., Facebook) have started offering their users more control, such as the ability to set the visibility of individual items.However, if interfaces become overly complicated, then users will not understand the settings or find them too cumbersome, and thus, they might set them in an unreasonable manner or ignore them.In a case study, Gross and Acquisti [5] show that most users do not change the default privacy settings provided by the OSN when sharing a large amount of information on their profile.In another case study, Tufecki [6] concludes that privacy-aware users are more reluctant to join social networks, but once they join, they still disclose a vast amount of information.In other words, an overwhelming majority of people have considerable difficulty understanding privacy settings [7], especially now that most users are using multiple social networks [8][9][10][11].Moreover, many researchers focus on social identity linkage (SIL) across multiple OSNs, which is an effort to identify users from multiple heterogeneous OSNs and integrate the various networks.The compelling nature of the field has motivated many studies [12][13][14][15][16][17][18][19][20]; however, few people pay attention to the security issues this type of research introduces.Given this trend, malicious attackers can integrate a complete online role via profiles across multiple OSNs, and serious harm can be caused to the real individual in various ways [21].Given this background, privacy protection on a single platform is far more than enough to manage [8][9][10][11]22,23].
At present, there is no perfect solution to this problem because users have varied requirements for privacy protection that depend on the context.Therefore, the best solution is to provide a method to quantify the privacy of individuals, transform the virtual concept of privacy into a visible physical space, help users accurately recognize the state of their privacy and help users improve their privacy.We are deeply aware that only by letting users understand their privacy leakage can we better protect user privacy.In this paper, we propose a measurement framework based on multiple OSNs to ensure that users can understand the state of their privacy and rationally adjust their privacy settings to improve it.

Related Work
The role of OSNs represents social relationships that exist in real life, which is called the real-life social network (RSN).The PII stored in OSNs can be modelled as an online social graph, and there is a one-to-one mapping from the RSN to the online social graph model.If multiple OSNs are used, this mapping can be found especially quickly, accurately, and inexpensively due to SIL research.Therefore, the disclosure of PII may lead to malicious attacks from the cyberspace and real world [24,25]; examples of these attacks include, but are not limited to, tracking, defamation, spamming, phishing, identity theft, profile cloning, Sybil attack, etc.
In recent years, many studies have been performed on privacy preservation via data mining and publishing; additionally, some privacy protection methods have been proposed for specific scenarios, such as RFID(Radio Frequency Identification) and smart grids [26,27], but not much has been explored related to user privacy awareness, which can be defined as the individual's awareness of the actions and behaviours required to protect their personal information [28,29].In contrast, SIL is developing rapidly [12][13][14][15][16][17][18][19][20].
In the existing research on privacy metrics, some researchers measure a single aspect.For example, Dey et al. [30] studied the amount of work that has been done regarding the harm to users caused by the disclosure of age information.Liang et al. [31] conducted an in-depth study on the privacy disclosure of a social network from the perspective of image deletion delay.Srivastava et al. [32] discussed the issue of privacy disclosure from the dissemination of character information.
Meanwhile, others have tried to solve the problem based on the overall consideration of the user.Maximilien et al. [33] discussed the concepts of attribute sensitivity and profile visibility, then evaluated these two values using a Bayesian method to evaluate privacy.Liu et al. [34] broadened their study in a different way: Using item response theory (IRT) in combination with sensitivity and visibility to provide an intuitive and mathematical approach for calculating privacy metrics of OSNs.Fang et al. [7] devised a template for privacy wizards to help users complete profile settings; however, they did not explain why.In the opinion of Zeng et al. [35], personal privacy disclosure levels are based on the protection of information that public friends disclose; they proposed a framework to assess the privacy disclosure in a community.Similarly, Li Minghui et al. [36] believed that attackers could use the background knowledge of public neighbours to obtain the privacy of victims; they used K-anonymity and L-diversity to approach this challenge, but these two approaches do not completely solve the issue.Ruggero G. Pensa et al. [23] used a circle-based definition of privacy score to measure privacy leakage and applied a learning approach to help users change privacy settings.
Currently, users are not confined to using only one social network.Thus, the above studies, which are based on a single platform, cannot accurately quantify privacy leakage.However, studies based on multiple OSNs are extremely rare and immature.Irani et al. [37] revealed that attackers can aggregate a user's PII on multiple OSNs for identity and password recovery attacks.However, they did not propose an effective method for solving this problem, only suggesting that users should disclose their PII as little as possible.Patsakis et al. [25] proposed a framework based on scenarios with multiple social networks that can achieve the goal of protecting user privacy; however, it is impossible for each social network to interact with each other in practice.Erfan et al. [38] combined attribute sensitivity and visibility, and used statistical fuzzy systems to solve the problem of privacy metrics on multiple OSNs.Nonetheless, statistical fuzzy systems, which include fewer quantitative components and more qualitative components, are difficult to use to convince people.
Over the course of our research, we noticed that many researchers are working on SIL.The results of SIL can benefit many applications, such as building interest models, providing a better view of expertise, improving social recommendations, and improving the ability to search for people across websites.However, these studies ignored various privacy and security threats, such as identity thefts and profile cloning, which can lead to compromised accounts, directed spam, phishing, online profiling by advertisers and attackers, and online stalking.Although the security issue is serious, an inevitable situation must be faced, namely, the vast number of users on social networks.Given an SIL problem instance of two social network platforms with N 1 and N 2 users, the number of all possible pairs of users to examine is given by [13]: where However, because N represents billions of users, (1) is impossible to calculate.Existing studies have applied heuristic knowledge regarding overlapping PII, such as username, avatar, or email address, to reduce this scope.Therefore, a reasonable PII setting can be effectively prevented by SIL.Additionally, the privacy protection method we proposed is inspired by this.
As outlined above, we consider and encourage that users deliberately fill in spurious PII, especially for items that are sensitive and mandatory.The work in this paper is inspired by the privacy score defined by Liu and Terzi [34], for which we have greatly improved the measurement method in [34] to adapt it to scenarios with multiple social networks to reflect how OSNs are currently used by most users.Additionally, the discussion and usage of attribute content and privacy awareness are heuristically added to better combine the behaviour and psychology of users in social networks and more accurately measure privacy leakage.Based on these improvements, we proposed a new approach to prevent SIL.Our main contributions are as follows:

Problem Descriptions and Notation
In actual scientific research, there is not an effective way to warn users how much their privacy will be exposed when PII is divulged or certain changes are made to their privacy settings.Without a practical and effective approach to quantify, measure, and evaluate privacy, it is hard for users to determine how much information they are willing to share and understand the risk involved.It is impossible for OSN service providers to make appropriate policies to protect user privacy.Meanwhile, privacy measurement, as a virtual concept, is a challenging issue because the definition of privacy is subjective.Users have different opinions and expectations about privacy.Social networking sites protect privacy by profile setting, which includes attributes consisting of structured data; privacy can be identified by this structured information, such as name, hometown, birthdate, etc.Therefore, using the attributes of a profile is a good way to measure the privacy of individuals.
However, the usage of attributes also introduces problems.Platforms vary regarding the attributes they require users to provide [1,2,40], and each attribute has a different impact on individual privacy; for example, birthdate, phone number, and address cause unequal levels of privacy leakage.Therefore, we first must determine a method of quantifying the degree of leakage for attributes.
Here, we introduce the mathematical notation we will use in the rest of our paper.We use a set of n users µ= {u 1 , u 2 . . . . . .u n } corresponding to the individuals participating in OSNs.Each user has a set of m attributes or profile items α n = {a 1 , a 2 . . . . . .a m }; for instance, the PII includes items such as username, gender, birthdate, and hometown.In addition, for convenience, we consider that β = {b 1 , b 2 . . . . . .b s } corresponds to the same attribute on s OSNs.More specifically, β ε m = {b 1 , b 2 . . . . . .b s } corresponds to the extraction difficulty of attribute m on s OSNs.
Other general notations that are used in our framework are presented in Table 1.

The Measurement Method
To calculate a user's privacy score, we need to select the attributes that affect a user's privacy on a social network.These attributes (such as birthday, email address, and address) can be obtained from the user's profile, messages, pictures, and status updates posted by users.It should be noted that our research does not include methods for obtaining the content of these attributes from the above sources, but rather ensures the information actually exists and can be obtained in practice.
Inspired by the privacy score defined by Liu and Terzi [34], we add a fine-grained approach for a more accurate result.We calculate ε, δ, ω, and γ for each attribute on each platform and then calculate visibility.Finally, we can calculate privacy score p with sensitivity θ.

Extraction Difficulty
The difficulty of extraction ε represents the degree of difficulty associated with obtaining the attribute contents.It is relatively easy to obtain attributes from a profile that a user provided, but it is possible that some attributes are not provided or are not required.Therefore, attributes must be obtained or inferred from messages, images, videos, and other media, which is relatively complicated.For example, it is difficult to recognize a user's hometown in a picture or a video.To calculate the difficulty of the data extraction, we define the following formula: where ε i m is the value that expresses the extraction difficulty of attribute m on the ith platform.We sum all the values of ε i m and average it to obtain the ε m that indicates the total extraction difficulty of attribute m on s social networks.
To measure ε i m , we defined three levels: 1 is difficult, 2 is relatively easy, and 3 is easy.The specific definitions are as follows: 1 represents content obtained from pictures or other approaches, 2 represents content obtained from character messages, and 3 represents content directly acquired from the profile.This approach is taken because not every platform provides all attributes in the profile.In our experimental data, we document in detail the extraction difficulty of each attribute on each OSN.

Accessibility
In general, OSN operators provide a way to protect privacy by allowing users to set accessibility (i.e., visibility to specific users) for each attribute in their profile, which includes making the attribute visible to only the user or to everyone.Accessibility signifies how many people can see the attribute content.According to popular OSN settings, we define four different levels of accessibility: 1 represents content access by only the owner of the information; 2 represents content access by friends; 3 represents content access by a specific group of people, such as colleagues and schoolmates; and 4 represents publicly available content.
Because our research is based on multiple platforms, we calculate each attribute separately for each platform.Moreover, we consider that not all users fill in the same content for one attribute on each platform; therefore, we developed statistics of user profiles, as shown in Table 2, where 1-4 indicate the accessibility of the attribute, and A-Z represent the content of the attribute; we use letters to represent attribute content to protect user privacy.In addition, the same letter in the same attribute of different users does not represent the same content, and 0 means that we cannot access the attribute content, which means a user does not disclose any information for this attribute.β δ,λ = (1A, 3B, 1A, 0, 4B), for instance, is the value of an attribute, and the final result is (1 + 3 + 4)/3 ≈ 2.67.Unlike other researchers, we consider the situation in which users fill in different content for the same attribute, which means they fill in spurious content to protect privacy.The existing algorithm can be used to achieve character consistency, while content extracted from images and long text information is manually performed; in the near future, we expect to use deep learning to remove the need for this manual step.

Algorithm 1. Calculation of accessibility
Notably, different content may have the same accessibility, which causes privacy leakage.Repeated items are removed because they do not provide additional privacy losses, i.e., they are redundant.Values of 0 are excluded to address cases like (4A, 0, 0, 0, 0), where if we do not rule out 0, the result is 2, which is unreasonable because this attribute has been fully disclosed.The accessibility should not be lower than (3A, 3B, 3C, 0, 0).The average is used to compare the differences between (4A, 4B, 4C, 4D, 4E) and (4A, 4A, 4A, 4A, 4A); if this approach is not used, the former result is 16 and the latter is 4, although the accessibility should be exactly the same.

Reliability
Attribute reliability quantization, one of the most important aspects, determines whether privacy metrics are accurate because spurious content does not have the same impact as real content.
Previous research showed that people are willing to share real information with others on social platforms.Therefore, [28] adopts the following strategy: As the number of sources of disclosure increases, reliability increases.Because we considered attribute content and users are more likely to fill in real PII, for our improvement, we use the maximum number of the repeated content to measure reliability.
While collecting and processing our experimental data, we analysed the data to comprehend user behaviour.We found that content used to fill an attribute for various platforms is very diverse in terms of reliability.In our 279 samples, the content reliability of several platforms is shown in Table 3, from which we can see that the reliability on a single platform is much less than 50%, but has a linear growth rate when the same content is provided on more platforms.Then, as more platforms show the same content, growth will slow accordingly so that the global trend forms an S-curve.It can be seen from the curve in Figure 1 that the function used by Erfan [38], (3), does not fit the actual situation well.Meanwhile, Liu and Terzi [34] use the item response theory (IRT) theory model to calculate the overall privacy score without considering reliability.It can be seen from the curve in Figure 1 that the function used by Erfan [38], (3), does not fit the actual situation well.Meanwhile, Liu and Terzi [34] use the item response theory (IRT) theory model to calculate the overall privacy score without considering reliability.
Because IRT (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments that measure abilities, attitudes, or other variables [41], we use IRT as our theoretical basis to design the function; the original IRT function is shown in (4).First, we set the value of b to 3/2 based on Table 3.Because the difference in reliability is largest between 1 and 2, the inflection point is between 1 and 2. In addition, q is the number of platforms that provide the same attribute content the maximum number of times; then, we use the least square method and the data in Table 3 to fit parameter a, and finally construct the function, as shown in (5).The function curve is shown in Figure 1.(4) The range of the function is (0, 1).The larger the value of s, the higher the reliability.

Privacy Awareness
In the measurement of privacy leakage, privacy awareness refers to the users' knowledge and understanding of the privacy options available to them on the social networking site.Users with Because IRT (also known as latent trait theory, strong true score theory, or modern mental test theory) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments that measure abilities, attitudes, or other variables [41], we use IRT as our theoretical basis to design the function; the original IRT function is shown in (4).First, we set the value of b to 3/2 based on Table 3.Because the difference in reliability is largest between 1 and 2, the inflection point is between 1 and 2. In addition, q is the number of platforms that provide the same attribute content the maximum number of times; then, we use the least square method and the data in Table 3 to fit parameter a, and finally construct the function, as shown in (5).The function curve is shown in Figure 1.
The range of the function is (0, 1).The larger the value of s, the higher the reliability.

Privacy Awareness
In the measurement of privacy leakage, privacy awareness refers to the users' knowledge and understanding of the privacy options available to them on the social networking site.Users with higher privacy protection awareness usually hide sensitive information or fill in spurious attribute content to confuse attackers and increase the difficulty of malicious behaviour.Therefore, it is necessary to measure privacy awareness (although it was not considered in previous studies).
Our approach for measuring privacy awareness is counting the number of different attributes that users choose to fill with the same content on multiple OSNs.For example, user λ filled in ( A, B, A, C, B) for an attribute, in which at least two of the values are spurious.Moreover, privacy leakage is minimal in the case (0, 0, 0, 0, 0), which indicates that the user has not disclosed any information about this attribute in the profile or other media.The higher the user's privacy awareness, the lower the possibility that their privacy is exposed.To measure privacy awareness, we used the following function: We design the function in this manner because it is a 3-parameter logistic model (3PL), which is a variant of IRF (item response function).IRF provides the probability that a person with a given ability level will answer correctly.People with lower ability have less chance of answering correctly, while persons with high ability are very likely to answer correctly [42]; thus, this method can be an excellent expression of the human sense of privacy.
As seen from the function curve in Figure 2, the greater the number of instances of spurious content, the greater the difficulty for a malicious attacker to attack and the lower the user privacy leakage, which expresses stronger user privacy awareness.higher privacy protection awareness usually hide sensitive information or fill in spurious attribute content to confuse attackers and increase the difficulty of malicious behaviour.Therefore, it is necessary to measure privacy awareness (although it was not considered in previous studies).
Our approach for measuring privacy awareness is counting the number of different attributes that users choose to fill with the same content on multiple OSNs.For example, user  filled in (A, B, A, C, B) for an attribute, in which at least two of the values are spurious.Moreover, privacy leakage is minimal in the case (0, 0, 0, 0, 0), which indicates that the user has not disclosed any information about this attribute in the profile or other media.The higher the user's privacy awareness, the lower the possibility that their privacy is exposed.To measure privacy awareness, we used the following function: We design the function in this manner because it is a 3-parameter logistic model (3PL), which is a variant of IRF (item response function).IRF provides the probability that a person with a given ability level will answer correctly.People with lower ability have less chance of answering correctly, while persons with high ability are very likely to answer correctly [42]; thus, this method can be an excellent expression of the human sense of privacy.
As seen from the function curve in Figure 2, the greater the number of instances of spurious content, the greater the difficulty for a malicious attacker to attack and the lower the user privacy leakage, which expresses stronger user privacy awareness.

Visibility
Through the abovementioned work, we can quantify extraction difficulty, accessibility, reliability, and privacy awareness.To calculate visibility, we carefully choose to use the halfsuppressed fuzzy C-means clustering algorithm [39], which is a clustering algorithm based on FCM (Fuzzy C-means Algorithm).

Visibility
Through the abovementioned work, we can quantify extraction difficulty, accessibility, reliability, and privacy awareness.To calculate visibility, we carefully choose to use the half-suppressed fuzzy C-means clustering algorithm [39], which is a clustering algorithm based on FCM (Fuzzy C-means Algorithm).
However, the original algorithm has great time complexity and space complexity, so we simplified it to improve these problems.First, we remove the step for training an SVM (Support Vector Machine) because our input is only four dimensions and removing the SVM can enormously increase efficiency; meanwhile, the clustering result is sufficient.Second, we change the iteration of the algorithm: If the last step determines that the iteration does not end, we restore the new cluster centre to our pre-set cluster centre.
We chose the half-suppressed fuzzy C-means clustering method because it can achieve sufficient results using very few training samples, especially when considering the great need for an open large-scale sample set and data with unique needs such as ours.In addition, this method divides the sample space into several categories according to the value of the sample cluster membership to convert the scoring problem into a classification problem that uses an A-F rating instead of the hundred-mark system, as in educational systems, which can yield better generalization.
Moreover, the algorithm can manually specify the clustering centre, which is extremely easy to perform in our research.According to common sense, we can formulate a perfect clustering centre that is similar to the difference between (1A, 1B, 1C, 1D, 1E) and (4A, 4A, 4A, 4A, 4A).When we specify the cluster centre, we can determine the visibility of a sample using the cluster centre to which the sample is aggregated.
In a sample space of algorithm, X = ) represents the center of each cluster.u ij is the membership of sample x j belonging to the class which satisfies: The specific process of simplified algorithm is as follows: (1) Initialize cluster centers v (0) i , the inhibiting factor is α, inhibition threshold is β, prime index factor is m, error threshold is ε > 0 and the maximum number of iterations is K, set the number of iterations k = 0.
In this paper, we use accessibility, reliability, extraction difficulty, and privacy awareness as the four-dimensional sample input; then, we can obtain the visibility using the class into which the data are clustered.After we conduct a number of experiments, the better model parameters obtained are as follows: The inhibiting factor was 0.5, the inhibiting threshold was 0.5, the index prime factor m = 2, the error threshold D = 0.01, and the maximum number of iterations K = 30.We used the above parameters in the subsequent comparative analysis and set the outcome to six categories.The initial clustering centres have preset values.

Sensitivity
On social networking sites, there are far more attributes that affect user privacy than those we surveyed, and choosing reasonable attributes is crucial to the results.When calculating the final privacy score, the influence of various attributes on privacy leakage must be considered.This influence is called the sensitivity of the attribute.
In previous studies, Erfan et al. [38] used 11 attribute sensitivities calculated by Srivastava et al. [32], who used the naïve approach proposed by Liu and Terzi [34].However, we do not adopt this approach.They used the profile data that a user actually fills in to calculate the sensitivity, which is unreasonable; in real life, due to the complexity of settings, the actual settings are not consistent with a user's expectation, which means that using the profile setting cannot reflect the actual sensitivity of attributes.
In our study, we launched an online questionnaire about privacy sensitivity (https://www.wjx.cn/report/1647730.aspx).We divided attribute sensitivity into five levels: L1, not worried at all; L2, not worried; L3, no idea; L4, worried; L5, extremely worried.L1-L5 are given as the percent of people who select that level.Excluding the large number of duplicate questionnaires, we obtained 364 valid questionnaires.Then, we calculated sensitivity using (7); the results are shown in Table 4.We set L4 as the benchmark for sensitivity and use a weight coefficient adjustment for L3 and L5.
The other difference between our approach and that of Srivastava et al. [32] is that we add the username and avatar to guard against SIL, which is not considered in the previous work.Meanwhile, because our research background is in China, we abandon the attribute of religious views and political views.
The higher privacy score p, the more severe privacy leakage.

Experimental Evaluation
To fully prove the advantages of our method, our experiment is divided into three parts.The data in [38] are used in the first part.The second part uses the data obtained through our research.Then, we use these data to compare the performance of the other two existing methods.Finally, we illustrate methods of improving user privacy through our method and prove that our method is effective at preventing SIL.

Experiment 1
In [38], the authors collected the data of 15 users (represented as user a to user o in the original paper) who were involved in four different online social networks (i.e., Facebook, ResearchGate, LinkedIn, and Google+); 11 attributes for each user were used to measure the information disclosure and privacy risk of those users.They reported that the chosen number of users covers a diverse range of values from user profiles, which is needed to show the effectiveness of the proposed privacy scoring method.To guarantee the fairness of the comparison, we use the data of two of the users (represented as user b and user o in the original paper), which were fully published in their paper, and the sensitivity they reported.The data of the remaining 13 users is not public.Study [38] compared the privacy disclosure score of all users with that provided by the privacy scoring model by Liu & Terzi [34] to evaluate the superiority of their proposed model; we also include the model of Liu & Terzi [34] in our experiment.
Figure 3 shows the normalized privacy score calculated using the three methods.Because user1 has a higher accessibility for most attributes and a lower extraction difficulty, the privacy score of this user is higher than that of the others, which means privacy leakage is severe.The result of Liu et al. [34] is lower because extraction difficulty and reliability have not been considered.Moreover, only binary values are used to represent accessibility, where 1 means accessible and 0 means inaccessible; therefore, the final value is low.When our method and Erfan's method [38] are compared, the final value in our method is relatively low, as Erfan's method [38] does not consider attribute content.Users may use spurious attribute content on a highly accessible platform and fill in real information on a more confidential platform.Our method outputs a higher score for user2 because user2 fills in the same content for most attributes (see Table 5), which suggests that the privacy awareness of user2 is very poor.Although extraction difficulty and accessibility are high, peep screen, Trojans, and phishing attacks cannot be prevented; thus, accessibility is not as effective as imagined.Therefore, we believe that the privacy leakage of user2 is not as optimistic as shown by other methods.

Experiment 1
In [38], the authors collected the data of 15 users (represented as user a to user o in the original paper) who were involved in four different online social networks (i.e., Facebook, ResearchGate, LinkedIn, and Google+); 11 attributes for each user were used to measure the information disclosure and privacy risk of those users.They reported that the chosen number of users covers a diverse range of values from user profiles, which is needed to show the effectiveness of the proposed privacy scoring method.To guarantee the fairness of the comparison, we use the data of two of the users (represented as user b and user o in the original paper), which were fully published in their paper, and the sensitivity they reported.The data of the remaining 13 users is not public.Study [38] compared the privacy disclosure score of all users with that provided by the privacy scoring model by Liu & Terzi [34] to evaluate the superiority of their proposed model; we also include the model of Liu & Terzi [34] in our experiment.
Figure 3 shows the normalized privacy score calculated using the three methods.Because user1 has a higher accessibility for most attributes and a lower extraction difficulty, the privacy score of this user is higher than that of the others, which means privacy leakage is severe.The result of Liu et al. [34] is lower because extraction difficulty and reliability have not been considered.Moreover, only binary values are used to represent accessibility, where 1 means accessible and 0 means inaccessible; therefore, the final value is low.When our method and Erfan's method [38] are compared, the final value in our method is relatively low, as Erfan's method [38] does not consider attribute content.Users may use spurious attribute content on a highly accessible platform and fill in real information on a more confidential platform.Our method outputs a higher score for user2 because user2 fills in the same content for most attributes (see Table 5), which suggests that the privacy awareness of user2 is very poor.Although extraction difficulty and accessibility are high, peep screen, Trojans, and phishing attacks cannot be prevented; thus, accessibility is not as effective as imagined.Therefore, we believe that the privacy leakage of user2 is not as optimistic as shown by other methods.
It is worth noting that we considered the effect of the attribute content, which is not considered in other research; thus, we include attribute content to support our method (A is the method in [38] and B is our method).The final result is shown in Table 6.

Experiment 2
Due to space limitations in this paper, we selected seven students with great diversity from the 28 users, as shown in Figure 5, to show the detail in Table 7.In this experiment, we use the sensitivity in Table 4.As shown in Figure 5, we counted the privacy scores of the 28 users using the four social networks mentioned in the previous section; the blue "•" indicates the seven users selected in our experiment, and the yellow "×" indicates the remaining 21 users.Moreover, the privacy score has been normalized so that we can see that the majority are higher than 0.5.Although it is inevitable that partial privacy will be compromised, people's privacy is very serious and more extensive attention should be paid to it.Notably, these results are based on members of a university where user privacy awareness is generally high; thus, it can be expected that the privacy status in other places is more worrisome.
Figure 6 shows the score of seven people under the three methods.It can be seen that the score calculated by our method is lower in most cases because we consider cases of users providing false content, in particular user2 and user5, who prefer to provide different content on different platforms.In addition, since user7 provides the same content on most platforms, we believe that user7′s privacy leakage is more serious than that calculated by the other two methods.
To reflect the advantages of our approach on multiple social networks, we calculated privacy scores on both two social platforms and three social platforms in our samples for comparison.

Experiment 2
Due to space limitations in this paper, we selected seven students with great diversity from the 28 users, as shown in Figure 5, to show the detail in Table 7.In this experiment, we use the sensitivity in Table 4.As shown in Figure 5, we counted the privacy scores of the 28 users using the four social networks mentioned in the previous section; the blue " " indicates the seven users selected in our experiment, and the yellow "×" indicates the remaining 21 users.Moreover, the privacy score has been normalized so that we can see that the majority are higher than 0.5.Although it is inevitable that partial privacy will be compromised, people's privacy is very serious and more extensive attention should be paid to it.Notably, these results are based on members of a university where user privacy awareness is generally high; thus, it can be expected that the privacy status in other places is more worrisome.
Figure 6 shows the score of seven people under the three methods.It can be seen that the score calculated by our method is lower in most cases because we consider cases of users providing false content, in particular user2 and user5, who prefer to provide different content on different platforms.In addition, since user7 provides the same content on most platforms, we believe that user7 s privacy leakage is more serious than that calculated by the other two methods.
To reflect the advantages of our approach on multiple social networks, we calculated privacy scores on both two social platforms and three social platforms in our samples for comparison.
As shown in Figure 7, scores increase with the number of platforms, which means that the more platforms the users use, the more privacy exposure occurs.Liu et al. [34] do not consider the multiplatform scenario, and so the score in their method does not change much.Meanwhile, because the reliability and privacy awareness of our method will increase as the number of platforms increases, users may obtain a lower score when they use more platforms, which means that malicious attackers will encounter more difficulty and higher costs in the implementation of violations, thus making the privacy status of users more optimistic.
As shown in Figure 7, scores increase with the number of platforms, which means that the more platforms the users use, the more privacy exposure occurs.Liu et al. [34] do not consider the multiplatform scenario, and so the score in their method does not change much.Meanwhile, because the reliability and privacy awareness of our method will increase as the number of platforms increases, users may obtain a lower score when they use more platforms, which means that malicious attackers will encounter more difficulty and higher costs in the implementation of violations, thus making the privacy status of users more optimistic.As shown in Figure 7, scores increase with the number of platforms, which means that the more platforms the users use, the more privacy exposure occurs.Liu et al. [34] do not consider the multiplatform scenario, and so the score in their method does not change much.Meanwhile, because the reliability and privacy awareness of our method will increase as the number of platforms increases, users may obtain a lower score when they use more platforms, which means that malicious attackers will encounter more difficulty and higher costs in the implementation of violations, thus making the privacy status of users more optimistic.

Experiment 3
In this section, we will use User7 as an example to show how users reasonably change profile settings to reduce privacy leakage and guard against SIL using our method.
In Figure 8, we can see that the privacy score of each attribute significantly decreases after the settings change based on Table 8.Here, we provide an example of how to reduce privacy leakage using our method.Because people have varying requirements for privacy protection in the real world, there is no universal protection method.The fundamental purpose of the privacy score in this paper is to stimulate and cultivate the privacy awareness of users to ensure that the privacy of users is not violated from the beginning.Users can intuitively understand their privacy according to the privacy score.
To demonstrate the effectiveness of our approach to prevent SIL, we chose three methods of SIL that use profile matching [16,17,20] (represented by method 1, method 2, and method 3).First, we formed a dataset by randomly crawling profiles on Sina Microblog; this dataset includes 2000 profiles.Then, we put the profiles of the 28 people in our study into the Sina Microblog and Kaixin.comdataset.Finally, we use their Tencent Microblog and QQ profiles as testing data to match the 2056 profiles in the dataset.

Experiment 3
In this section, we will use User7 as an example to show how users reasonably change profile settings to reduce privacy leakage and guard against SIL using our method.
In Figure 8, we can see that the privacy score of each attribute significantly decreases after the settings change based on Table 8.Here, we provide an example of how to reduce privacy leakage using our method.Because people have varying requirements for privacy protection in the real world, there is no universal protection method.The fundamental purpose of the privacy score in this paper is to stimulate and cultivate the privacy awareness of users to ensure that the privacy of users is not violated from the beginning.Users can intuitively understand their privacy according to the privacy score.In this experiment, we ran each method 1000 times and recorded the precision of successful matches.Moreover, the changes of the profiles in our testing data settings did not deliberately differentiate the testing data from the dataset.Therefore, some profiles can still be matched after changing the settings.In Figure 9, because most people do not deliberately set different usernames, avatars, and other highly recognizable attributes on different OSNs, the possibility of a user being successfully matched is very high.After changing the settings, this probability is greatly reduced.To demonstrate the effectiveness of our approach to prevent SIL, we chose three methods of SIL that use profile matching [16,17,20] (represented by method 1, method 2, and method 3).First, we formed a dataset by randomly crawling profiles on Sina Microblog; this dataset includes 2000 profiles.Then, we put the profiles of the 28 people in our study into the Sina Microblog and Kaixin.comdataset.Finally, we use their Tencent Microblog and QQ profiles as testing data to match the 2056 profiles in the dataset.
In this experiment, we ran each method 1000 times and recorded the precision of successful matches.Moreover, the changes of the profiles in our testing data settings did not deliberately differentiate the testing data from the dataset.Therefore, some profiles can still be matched after changing the settings.In Figure 9, because most people do not deliberately set different usernames, avatars, and other highly recognizable attributes on different OSNs, the possibility of a user being successfully matched is very high.After changing the settings, this probability is greatly reduced.In this experiment, we ran each method 1000 times and recorded the precision of successful matches.Moreover, the changes of the profiles in our testing data settings did not deliberately differentiate the testing data from the dataset.Therefore, some profiles can still be matched after changing the settings.In Figure 9, because most people do not deliberately set different usernames, avatars, and other highly recognizable attributes on different OSNs, the possibility of a user being successfully matched is very high.After changing the settings, this probability is greatly reduced.

Conclusions and Future Work
With the growing number of users and the influence of social networks, the protection of privacy is an urgent problem that needs to be solved.People are no longer satisfied with a single social network, but different social networks expose different aspects of PII according to their purposes.These PII can be used to integrate a user's real identity, which can lead to serious harm.Methods of avoiding such harm are challenging to develop.In this paper, we consider accessibility, extraction difficulty, reliability, and privacy awareness.Then, we use the simplified half-suppressed fuzzy Cmeans clustering algorithm to calculate visibility; using sensitivity we can calculate a final privacy score for users in a multiplatform scenario.Through the privacy score, a user's personal information

Conclusions and Future Work
With the growing number of users and the influence of social networks, the protection of privacy is an urgent problem that needs to be solved.People are no longer satisfied with a single social network, but different social networks expose different aspects of PII according to their purposes.These PII can be used to integrate a user's real identity, which can lead to serious harm.Methods of avoiding such harm are challenging to develop.In this paper, we consider accessibility, extraction difficulty,

( 3 )
According to the above correction equation to get U (k) by fuzzy classification matrix u.

( 4 )
Calculate new cluster center from v i =

Figure 8 .
Figure 8. Privacy score after profile changes.

Table 2 .
Attribute content and accessibility.
Through Table2, we can use Algorithm 1 to calculate the accessibility for all attributes.

Table 5 .
Attribute accessibility and content.

Table 7 .
Attribute content and accessibility.

Table 7 .
Attribute content and accessibility.