The Research of “Products Rapidly Attracting Users” Based on the Fully Integrated Link Prediction Algorithm

: One of the main problems encountered by social networks is the cold start problem. The term “cold start problem” refers to the difﬁculty in predicting new users’ friendships due to the limited number of links those users have with existing nodes. To ﬁll the gap, this paper proposes a Fully Integrated Link Prediction Algorithm (FILPA) that describes the social distance of nodes by using “betweenness centrality,” and develops a Social Distance Index (SDI) based on micro- and macro-network structure according to social distance. With the aim of constructing adaptive SDIs that are suitable for the characteristics of a network, a naive Bayes (NB) method is ﬁrstly adopted to select appropriate SDIs according to the density and social distance characteristics of common neighbors in the local network. To avoid the risk of algorithm accuracy reduction caused by blind combination of SDIs, the AdaBoost meta-learning strategy is applied to develop a Fully Integrated Social Distance Index (FISDI) composed of the best SDIs screened by NB. The possible friendships among nodes will then be comprehensively presented using high performance FISDI. Finally, in order to realize the “products rapidly attracting users” in new user marketing, FILPA is used to predict the possible friendship between new users in an online brand community and others in different product circles. base and expand word-of-mouth quickly with a high conversion rate to achieve successful product marketing.


Introduction
Social networks can provide a low-cost, effective, and efficient marketing platform for enterprises. By leveraging social networks, marketing enterprises can swiftly grow their user base, accumulate user groups, and realize "products rapidly attracting users" among new users. An online brand community based on a social network is a relational network grouped by users with common interests and desires for brands. This group is characterized by close unity, highly efficient communication, and convergent actions [1]. In order to expand the marketing scale and benefits of an online brand community, marketers should attract new users to join the brand community and keep purchasing the brand's products. Therefore, it is necessary to quickly foster friendships between new users and those of the brand community, and then carry out marketing to new users by leveraging the influence of these friendships.
Based on the understanding, homogeneity, and trust between friends [2], users can get valuable, trustworthy [3], and high-quality [4] commodity information through frienddriven communication. At the same time, friendship leads users to be more inclined to believe that the sales operations conducted by marketers are genuinely based on the quality of the product [5]. As a consequence, users are more willing to accept information shared friends [6], making it easier to stimulate users' intent to purchase [4] and influence their purchasing decisions [7]. For example, the Florida government set up Students Work

Link Prediction
The link prediction problem can be formally defined as follows: Given the snapshot of social network at time t, we can infer the missing link subset at current time or predict the link subset that will be added to the network at that time t + ∆ by defining a similarity function or probability function or by using supervised learning or optimization technology [13]. The research idea of link prediction based on complex networks has attracted the attention of researchers from many fields, such as community structure detection [14], rumor spreading, and security monitoring [15]. The existing link prediction methods can be classified into similarity-based methods, probabilistic statistical techniques, algorithmic methods, and preprocessing approaches [13]. Among these methods, due to its low computational complexity and potential for application in large-scale networks, the similarity method has become the most studied method [13,16].
In the light of the amount of information in the network topology, the similarity method can be divided into two categories: local method and global method [17]. The method based on local similarity considers the structure information related to the neighborhood of nodes. If two nodes are similar, they are more likely to be linked together. The disadvantage of the method based on local similarity is that there is little information about common neighbors, which cannot solve the problems of network sparsity and cold start. Some scholars consider the global similarity method and use the topological information of the whole network to score the unlinked node pairs [18]. For example, Rafiee S et al. [19] considered clustering coefficient and the neighbors of shared neighbors, which led to better performance. Zareie A et al. [20] considered direct and indirect common neighbors to predict potential relationships. While this increased the amount of information, it did not provide a lot of effective information. Nicholas and James emphasized the Three Degrees of Influence Rule [12], that is, social distance could affect the relationship between users in a social network. In addition, existing literature found that betweenness centrality best quantifies the path connectivity [21]. For example, scholars divided a network into communities and used the centrality of nodes [22] or community centrality [23] to describe the similarity of nodes for prediction, thereby obtaining good prediction results. Therefore, in order to enrich the available information of link prediction in the network, this paper uses the social distance of related nodes, which is described by betweenness centrality in a wide range, to solve the problems of network sparsity and cold start.
Because a single SDI cannot fully describe the network characteristics and cannot be applied to the cold start environment faced by the prediction of new users' friends, it is necessary to develop a combined link prediction algorithm. In the literature, a lot of combined link prediction algorithms have been developed using complex network characteristics. For example, Lu et al. [24] combined rich semantic and time information, fully mining the meta-path in the network, and proposed a unified link prediction framework (UniLPF) to predict academic relationships in academic networks based on the similarity method. Ozcan et al. [25] combined time information, correlation between link evolution and multi-type relationships, and the link information of nodes, and using local and global similarity methods, put forward a new method called multivariate time series link prediction to predict links in dynamic, undirected, weighted, or unweighted heterogeneous social networks. Hristova et al. [26] constructed a multi-layered online social network using Twitter and location-based social networks, and used a similarity method to predict links across social networks according to their structure and interaction characteristics. Ozcan et al. [27] used a nonlinear autoregressive exogenous model (NARX) and proposed a multivariate method to predict links, which combined the correlation between different link types and the influence of different local and global topological similarity methods in different time periods.
The above combined algorithms cannot consider the complementarity between single indexes. Therefore, it cannot guarantee the comprehensive description of the connection possibilites between nodes. To fill the gap, this paper innovatively proposes a method of selecting the appropriate SDI according to the maturity of the network. Due to the diversity of network structures, a single index can only describe the characteristics of social distance from a single point of view, and combining single indexes randomly can cause the repeated description of network characteristics [10], which cannot enrich the network characteristics described by the integrated index. For example, Lü L et al. [28] applied the weighted indices to the co-authorship network and the US air transportation network; they found that the weighted indices could not provide better performance. Liben-Nowell and Kleinberg [29] reported a similar observation for weighted Katz index and unweighted Katz index, which used all paths between two nodes in a network to measure node similarity. In order to fully find the characteristics of potential social distance, this paper uses the AdaBoost meta-learning strategy to identify FISDI composed of the best SDIs that are screened by NB, and adopts high-performance FISDI to comprehensively describe the possibility of network users becoming friends.

Product Marketing in Brand Community
Define the online brand community as P (V,E), where V is the node set, representing users in the community, and E is the edge set, representing the friendship among users. In the brand community, users will form a circle because they like the same kind of products. Assume that, in the online community P, the user set in the circle of product D is defined as V_D. In the process of "products rapidly attracting users" for marketing to new users, users in the circle of D are recommended to new users as friends, and product marketing is realized with the influence of friendship. Figure 1 shows the schematic diagram of "products rapidly attracting users" based on the users' friend recommendations in the circle of product P. Figure 1a represents the network before evolution, Figure 1b shows the network after evolution. In Figure 1a, the users that belong to online community P are 1, 2, 3, 4, and 5. Assuming that the new user is 6, the purpose of friend recommendation is to enable user 6 to establish friendship with the users in circle P, so as to realize "products rapidly attracting users". By making use of the influence of friendship, we can help products accumulate a customer base and expand word-of-mouth quickly with a high conversion rate to achieve successful product marketing. social distance from a single point of view, and combining single indexes randomly cause the repeated description of network characteristics [10], which cannot enrich network characteristics described by the integrated index. For example, Lü L et al. applied the weighted indices to the co-authorship network and the US air transporta network; they found that the weighted indices could not provide better performa Liben-Nowell and Kleinberg [29] reported a similar observation for weighted Katz in and unweighted Katz index, which used all paths between two nodes in a networ measure node similarity. In order to fully find the characteristics of potential social tance, this paper uses the AdaBoost meta-learning strategy to identify FISDI compose the best SDIs that are screened by NB, and adopts high-performance FISDI to com hensively describe the possibility of network users becoming friends.

Product Marketing in Brand Community
Define the online brand community as P(V,E), where V is the node set, represen users in the community, and E is the edge set, representing the friendship among us In the brand community, users will form a circle because they like the same kin products. Assume that, in the online community P, the user set in the circle of produ is defined as V_D. In the process of "products rapidly attracting users" for marketin new users, users in the circle of D are recommended to new users as friends, and prod marketing is realized with the influence of friendship. Figure 1 shows the schematic diagram of "products rapidly attracting users" ba on the users' friend recommendations in the circle of product P. Figure 1a represents network before evolution, Figure 1b shows the network after evolution. In Figure 1a users that belong to online community P are 1, 2, 3, 4, and 5. Assuming that the new is 6, the purpose of friend recommendation is to enable user 6 to establish friendship w the users in circle P, so as to realize "products rapidly attracting users". By making us the influence of friendship, we can help products accumulate a customer base and pand word-of-mouth quickly with a high conversion rate to achieve successful prod marketing.

SDI for Friend Recommendation
In general, new users have few friends; the cold start problem is faced when attempting to recommend friends to new users for the purpose of product marketing [30]. In this case, traditional SLPA cannot guarantee accurate friend recommendations for new users. To solve this problem, this paper proposes a new link prediction index based on the Three Degrees of Influence Rule (TDIR) [12]. TDIR states that, if the social distance is within three degrees, it can be called a strong connection and can trigger actions; while weak connections will occur when social distance is more than three degrees, prompting only the transmission of information between users. Generally speaking, whatever we do or say, it will have impacts on our friends (first degree), our friends' friends (second degree), and even the friends of our friends' friends (third degree). If the relationship is beyond the third degree, it will reduce our influence. According to TDIR, this paper presents the social intensity of users and predicts the possibility of connection between nodes by the shortest social distance between users instead of being confined to a social distance within three degrees.
Similarly, if the social distance between a user's friends and other users is close, we can assume the user potentially has close social distance to the other users (it will arise in the evolved network). It will be easier to become friends with two nodes when their common friends have a closer social distance to other users.
Firstly, the betweenness centrality is used to represent the social distance between any node z and other nodes (i.e., n ij * θ + 1, where θ represents the constant parameter, |V| is the number of nodes, n ij means the total number of shortest paths between node pairs (i, j), and n ij (z) indicates the number of shortest paths through node z). s(x) expresses the potential social distance of node x, namely, represents the neighbor set of node x. In each node pair (x, y) to be predicted, one is a new node, the other is a known node in the circle.
On the basis of the principle of the potential social distance for constructing friendship, this paper builds SDIs based on a micro-and macro-network to comprehensively describe the implicit social distance information in an extremely sparse network.

SDI Based on Microstructure
Firstly, based on the principle that the closer the social distance between the common neighbor of two nodes and other nodes, the more likely it is that those two nodes will become friends, this paper proposes five indicators. In detail, CND0 is the direct influence of the distance between the pair of nodes, which is caused by the social distance between the common neighbor and other nodes. CND1-CND4 are the social distance influence coefficients of the common neighbor (the ratio of the social distance between a common neighbor's direct neighbors and other nodes to the social distance between their indirect neighbor and other nodes), and a larger coefficient will help spread the influence of social distance in the local network better.
In this case, the closer the social distance between the direct neighbors of the common neighbor and other nodes, the easier it is for this node pair to become friends. The details are shown in Table 1. Table 1. SDI based on microstructure.

SDI Based on Macrostructure
The cold start problem is normally caused by the small number of links belonging to a new node., This section innovatively proposes more SDIs, which mainly take the role of indirect common neighbors (i.e., friends of common neighbors) into account. Specifically, the social distance of common neighbors' friends can change the connection of common neighbors and can have indirect effects on whether the target node pairs can establish links. These newly proposed SDIs are CCND1, CCND 2, CCND 3, and CCND4.
(a) CCND1 CCND1 is the social distance influence coefficient, that is, the ratio of the social distance between the direct neighbors of a common neighbor and other nodes to the social distance between the indirect neighbors of that common neighbor and other nodes. The larger the coefficient, the better the conductivity of the influence of social distance in the local network. In this case, the closer the social distance between the direct neighbor of the common neighbor and other nodes, the easier it is for this node pair to become friends.
Furthermore, this paper considers the influence coefficient of social distance of common neighbors and the influence of the combination of social distance between neighbors on the social distance between node pairs. Therefore, CCND2 is defined as shown in Formula (2).
Among them, c(z) represents the agglomeration coefficient of the node, indicating the social distance between neighbor nodes, namely, c(z) = 2 * e z |Γ(z)| * (|Γ(z)|−1) , where e z means the actual number of edges between neighbors of node z, |Γ(x)| expresses the number of neighbors of node x, α, and β are the constant parameters.

(c) CCND3
For any node x, we analyze the influence of the social distance between its neighbors and other nodes, as well as the influence of the social distance between neighbors on the social distance of other nodes, and define the combination influence coefficient of social distance, i.e., At the same time, the moderating effect of the focus degree of common neighbors to node pairs (represented by 1-c(z)) on this combination influences the coefficient. Through the combination of the above coefficients, we describe the combination coefficient of social distance influence (namely CCND3). The larger the coefficient, the better the conductivity of social distance influence, as shown in Formula (3).
(d) CCND4 We integrate the social distance influence coefficient CND4 of the micro-network with the social distance influence coefficients CCND2 and CCND3 of the macro-network to define a new combined index CCND4, as shown in Formula (4). where δ, ε, and σ represent the weight parameters of S CND4 xy , S CCND2 xy , and S CCND3 xy , respectively.

FILPA
To overcome the cold start problem of new users' friend recommendation, this section deeply excavates the information contained in the network structure, that is, the different nodes' social distance within the network is different, and then puts forward FILPA. The possibility of new users becoming friends with other nodes in different circles is fully described by adaptively building a high-performance FISDI for a specific circle structure. FILPA considers the relationship between the joint effect of various characteristics of the local network and the prediction performance of the algorithm, selects the NB model, and chooses the appropriate SDIs according to the maturity of the network. Moreover, because a single SDI often makes too high or too low of an estimation, the random combination of SDIs cannot guarantee good results every time. FILPA adopts the AdaBoost metalearning strategy to identify the fully integrated index composed of suitable SDIs screened by NB, then selects the index with the highest precision. Figure 2 shows the structure of FILPA. In Figure 2, when the accuracy of the combination of scoring index A and the other two indexes is higher than that of index A, the two indexes are considered to be the integrated index.

FILPA
To overcome the cold start problem of new users' friend rec tion deeply excavates the information contained in the networ different nodes' social distance within the network is different, FILPA. The possibility of new users becoming friends with othe cles is fully described by adaptively building a high-performan circle structure. FILPA considers the relationship between the characteristics of the local network and the prediction performan lects the NB model, and chooses the appropriate SDIs according network. Moreover, because a single SDI often makes too high o tion, the random combination of SDIs cannot guarantee good re adopts the AdaBoost meta-learning strategy to identify the fully posed of suitable SDIs screened by NB, then selects the index wit Figure 2 shows the structure of FILPA. In Figure 2, when the accu of scoring index A and the other two indexes is higher than that dexes are considered to be the integrated index.

Algorithm Evaluation
The area under the curve (AUC) [31] is the most commonly measuring the accuracy of link prediction algorithms. AUC ran nected and unconnected node pairs in the test set, and compare tained by SDIs. In independent comparisons, if the number o with a higher score is 1 times, and the number of connected n or equal score is 2 times, then the AUC is shown in Formula (5

Algorithm Evaluation
The area under the curve (AUC) [31] is the most commonly used standard index for measuring the accuracy of link prediction algorithms. AUC randomly selects the connected and unconnected node pairs in the test set, and compares their score values obtained by SDIs. In m independent comparisons, if the number of connected node pairs with a higher score is m1 times, and the number of connected node pairs with a higher or equal score is m2 times, then the AUC is shown in Formula (5).
When the network size is large, the AUC value obtained by this random sampling method can reduce the computation complexity and improve the computation efficiency. Obviously, the larger the AUC value, the higher the accuracy of the algorithm.

NB for Screening the Best SDI
Since the NB model requires few estimated parameters, is not very sensitive to missing data, and the algorithm is relatively robust, FILPA adopts the NB model to adaptively select SDIs suitable for the link prediction characteristics of new users, then selects appropriate SDIs according to the network maturity. In this paper, the NB model is used to select the appropriate SDI according to the density of common neighbors and the measure of social distance.

Discriminant Factors for the Best Indexes
In order to select SDIs suitable for different networks, this paper uses network maturity to distinguish the best indexes. The evaluation of network maturity includes the quality and quantity characteristics of networks. The quantity aspect refers to the network density and network connection scale. The qualitative aspects include the social distance of the node in the networks, and the stability, diversity, and dispersion of network connections [32]. This paper describes the network maturity characteristics of the local network comprehensively from two dimensions, that is, the density of common neighbors and the social distance between the common neighbor and other nodes, then selects the appropriate SDIs accordingly. In order to overcome the influence of cold start on the accuracy of the algorithm, for each node pair to be predicted, the characteristics of its direct common neighbors and one-step, two-step, and three-step indirect common neighbors are considered simultaneously.

(a) Density of common neighbors
The more common neighbors a node pair has, the more intimate the social distance between them, and the more likely the pair is to become friends. The network features related to the density of common neighbors include density [33] and centrality of mean eigenvector [34], as shown in Formulas (6) and (7).
where C E (i) ∝ ∑ j∈Γ(i) C E (j) can be obtained by recursively solving the centrality of eigenvector of node i, |E| represents the total number of edges of the network. The specific algorithm is described as follows.
f or all i; } Until (λ stops changing)

(b) Social distance between the common neighbor and other nodes
Since a greater number of the shortest paths pass through the nodes of common neighbors, the social distance between the common neighbor and other nodes is more intimate. Therefore, we considered the indexes related to the shortest paths, such as average proximity centrality, average betweenness centrality, and the size of average connected groups [35], as shown in Formulas (8)- (10).
where t represents the number of connected groups, N(o) represents the size of the o-th connected group, and d ij represents the shortest path between node i and node j.

NB Model
The independent variable (X) of the learning sample is defined as the above five local network characteristic indexes, and the dependent variable (Y) is the SDI label with the largest AUC value. Suppose that the whole training network set is W = (X 1 , X 2 , X 3 , X 4 , X 5 , Y), for each network, according to its AUC value on each network characteristic index mentioned in Section 4.2.1, it is assigned to the class y i corresponding to the SDIs with the largest AUC value. Assume that there are L kinds of labels, that is, y i ∈ F = {y 1 , y 2 , . . . , y L }, a partition {T 1 , T 2 , . . . , T L } of W is obtained. When the input network feature vector is (x 1 , x 2 , x 3 , x 4 , x 5 ), the probability that the sample belongs to class y i is shown in Formula (11).
= P(X 1 =x 1 ,X 2 =x 2 ,...,X 5 =x 5 |Y=y i ) * P(Y=y i ) P(X 1 =x 1 ,X 2 =x 2 ,...,X 5 =x 5 ) Based on the definition of each feature, assuming that they are independent of each other, according to the principle of NB, the molecules of Formula (11) can be further expressed as P(X 1 = x 1 , X 2 = x 2 , . . . , (11) is the same for all classes, the class with the largest molecule is the class to which the sample belongs. Therefore, the NB classifier can be expressed as Formula (12).
In Formula (12), P(Y = y i )(i = 1, 2, . . . , L) represents the prior probability of the class y i , which is obtained by the maximum likelihood estimation criterion, namely, P(Y = y i ) = |T i | |W| . P X j = x j Y = y i represents the conditional probability density of the occurrence of the j-th local network feature of the training set in class y i . Although normal distribution is the most popular approach to deal with continuous variables, it has the least information and the most uncertainty, which will result in a highly robust algorithm. Therefore, this paper adopts a non-parametric density estimation method, namely, kernel density estimation, to estimate the probability density function directly from the training samples, that is, . Among them, x j represents the j-th local network feature value, x ji represents the j-th local network feature value, T i , h represents the smoothing parameter, which is set to 1 cates the kernel function, being set to the most common Gaussian kernel function, namely, During the training of the NB model, the AUC value of each SDI is calculated for each sample in the training set, and the normalized standard deviation of it is represented by w, which will be used for the weight corresponding to the SDI in the integration algorithm in Section 4.3.

Identifying FISDI Based on AdaBoost Meta-Learning Strategy
In order to further improve the accuracy of the model and avoid the problem of algorithm performance degradation caused by the random combination of SDIs, FILPA introduces the AdaBoost meta-learning strategy to identify FISDI composed of the best SDIs that are selected by NB. Because the process of new user link prediction must tackle the problems of an extremely sparse local network structure, the accuracy of a single model to identify FISDI is not high. Through the AdaBoost meta-learning strategy, the weak recognition models with low accuracy can be enhanced to become strong recognition models with high accuracy. In the AdaBoost meta-learning strategy, Discriminant Analysis (DA) is used as the base classifier and linear regression is used as the meta classifier, multiple base classifiers are combined to improve the classification accuracy. On the basis of the learning results of the base classifier, the meta classifier is used for relearning how to obtain the final results, so that the low-level learning can be fully used in the high-level induction process. Figure 3 shows an example of the AdaBoost meta-learning strategy. . During the training of the NB model, the AUC value of each SDI is calcul each sample in the training set, and the normalized standard deviation of it i sented by w, which will be used for the weight corresponding to the SDI in the tion algorithm in Section 4.3.

Identifying FISDI Based on AdaBoost Meta-Learning Strategy
In order to further improve the accuracy of the model and avoid the proble gorithm performance degradation caused by the random combination of SDIs introduces the AdaBoost meta-learning strategy to identify FISDI composed of SDIs that are selected by NB. Because the process of new user link prediction mu the problems of an extremely sparse local network structure, the accuracy of model to identify FISDI is not high. Through the AdaBoost meta-learning strat weak recognition models with low accuracy can be enhanced to become strong r tion models with high accuracy. In the AdaBoost meta-learning strategy, Discr Analysis (DA) is used as the base classifier and linear regression is used as th classifier, multiple base classifiers are combined to improve the classification ac On the basis of the learning results of the base classifier, the meta classifier is u relearning how to obtain the final results, so that the low-level learning can be fu in the high-level induction process. Figure 3 shows an example of the AdaBo ta-learning strategy.

Discriminates Factors of Fully Integrated Index
On the type of all indexes of similarity criterion, Canberra, Sum of Absolute ence (SAD), Reciprocal of Absolute Value (RAV), and Max-min are the most rep tive. Therefore, this paper uses them to measure the similarity between SDIs; th mulas are shown as (13)-(16), respectively. Where ( , ) is the similarity score node pair composed of node and and calculated according to index in Sec symbol ∧ represents the smaller one selected from the two alternatives, and sy represents the selection of the larger one.

Discriminates Factors of Fully Integrated Index
On the type of all indexes of similarity criterion, Canberra, Sum of Absolute Difference (SAD), Reciprocal of Absolute Value (RAV), and Max-min are the most representative. Therefore, this paper uses them to measure the similarity between SDIs; their Formulas are shown as (13)-(16), respectively. Where sim k (i, j) is the similarity scores of the node pair composed of node i and j and calculated according to index k in Section 3.2, symbol ∧ represents the smaller one selected from the two alternatives, and symbol ∨ represents the selection of the larger one.

DA Base Classifier
In order to improve the efficiency of the algorithm, we limited the composite index to be composed of Q SDIs. The basic idea of the DA base classifier is to judge whether the Q SDIs can constitute a fully integrated index according to the similarity between two scores of the Q SDIs, which is calculated by Canberra, SAD, RAV, and Max-min.
Meta-learning is used in the process of training the DA base classifier, assuming that the total training sample set is S = {(a i , b i )|i = 1, 2, . . . , n}, where a i vector represents the similarity of any two of the Q SDIs calculated by the above four algorithms, b i ∈ Y = {0, 1, 2, 3, 4}, where 0 indicates Q SDIs cannot form a fully integrated index, 1, 2, 3, and 4 mean that the combination way to form a fully integrated index, where 1 indicates that all indexes are additive, 2 represents the index is made up of a number of blocks, which is composed of the combination of addition and subtraction of three indexes, 3 means that the index is comprised of a number of blocks, which is composed of the combination of subtraction and addition of three indexes, and 4 expresses that one index is subtractive from other indexes. Based on meta-learning, the weight w calculated in Section 4.2 is used to combine the Q SDIs with above different linear combinations for comprehensively mining characteristics of the implied social distance, y i is the label corresponding to the fully integrated index with the largest AUC value.
The base DA base classifier is denoted as h(u, v), where u is the similarity between algorithm scores, v is the category label, (namely b i ), and its output value is the probability that u belongs to class v,. Suppose that the i-th training input sample is (u i , v i ), p represents other classes except v i , and we also define operator r , when r is true, r = 1, when r is false, r = 0. When p = v i , DA makes three judgments on the sample: u i ∈ v i or u i ∈ p. There are three situations when u i is judged and classified: (1) When u i , v i = 0 and u i , p = 1, then u i / ∈ v i ; (2) When u i , v i = 1 and u i , p = 0, then u i ∈ v i ; (3) When u i , v i = u i , p , the possibility of u i ∈ v i is the same as u i ∈ p, then choose one of them at random. Therefore, the probability that u i is wrongly classified as p is shown in Formula (17) [36].
For the above five kinds of problems, there are four different kinds of p, and because each different p may have different importance in different situations, each p is given a specified weight q(i, p), (∑ p =v i q t (i, p) = 1). Therefore, Formula (17) is modified to Formula (18). 1 2

AdaBoost Framework
The AdaBoost meta-learning strategy shares outstanding performance in a multiclassification problem, so it is selected to identify FISDI composed of the best SDIs. According to Formula (18), its pseudo-error can be expressed as Formula (19).
where D t (i) represents the weight of the i-th sample, and the larger its value, the more likely the i-th sample is to be misjudged. The label weighting function q t (i, p) indicates the probability of classifying u i into class p wrongly. The larger its value, the more easily the sample can be misclassified, which needs to be examined in the next iteration of learning. q t (i, p) changes with multiple iterations, so as to get the final global classification model and achieve a better classification effect. The main steps of the AdaBoost meta-learning strategy proposed in this paper are as follows: Step 1: Generate the raw data S. For each sample in the training network set, the optimal indexes identified by the NB model are firstly eliminated from all L indexes, and then the remaining L-1 indexes are combined in pairs to form a composite index with the optimal indexes, respectively, and the w calculated in Section 4.2 is taken as the weight of the corresponding SDI. For each group of indicators, the similarity between two scores is calculated according to Canberra, SAD, RAV and Max-min, and the label is judged based on AUC value; Step 2: Input. The total training sample set S = {(a i , b i )|i = 1, 2, . . . , n}, and the number of iterations is T = 100. In each iteration, samples with the size of m * n are selected according to the sample distribution weight D obtained from the previous iteration, where m ∈ (0, 1) represents the proportion of selected samples. This algorithm ranks the weight vectors of sample distribution in descending order and selects the first m * n samples in total; Step 3: Initialize variables. Let D 1 (i) = 1/n, the weight of an error label p in the i-th sample is ω 1 i,p = D 1 (i) 4 , where i = 1, 2, . . . , n and p ∈ {Y − y i }; Step 4: At iteration T, generate T DA based classifiers. Cycle the following steps at the t-th iteration (t = 1, 2, . . . , T): a. Calculate the label weight according to Formula (20) and compute the sample distribution weight of the i-th sample based on Formula (21); b. According to the new sample set S i obtained from the sample distribution D t (i), DA is trained and to obtain the classifier h t (u, v); c. The pseudo-error ε t of h t is calculated according to Formula (19), if ε t ≥ 0.5, then jump to Step 5; d. Calculate the proportion β t = ε t 1−ε t of the current base classifier and update the weight vector, as shown in Formula (22).
Step 5: At the end of iteration T, the base classifiers h t (u, v) are linearly combined with different weights to get the final meta classifier H(u, v), as shown in Formula (23). H(u, v) is used to test the test samples. According to the similarity between the scores of two indexes calculated by Canberra, SAD, RAV, and Max-min, the fully integrated index composed of the most suitable SDIs selected by NB is distinguished. The final classification results are obtained by weighted voting rules, as shown in Formula (24).

High-Performance FISDI
FILPA further filters out high-precision FISDI to overcome the deficiency of available information faced by new users link prediction, and comprehensively describes the possibility of establishing links between users. Supposing that the AdaBoost meta-learning strategy selects k fully integrated indexes composed of the most suitable SDIs (expressed as B) which are identified by the NB model (i.e., E 1 (B), E 2 (B), . . . , E k (B)), FILPA selects the algorithm with the highest accuracy from the k fully integrated indexes, as shown in Formula (25).

Experimental Design
The validity of FILPA was verified by 971 ego-net data sets of Twitter provided by Stanford University. Ego-net is a social network composed of users and their friends. Each ego-net represents a brand community, where users who like the same kind of products will form a circle. The data sets divide community members into different circles according to products they favorite. 224 networks with product circles were selected from Twitter. In each experiment, 180 networks were randomly selected from 224 networks as training samples and the remaining 44 networks were used as test samples. All experiments in this paper were implemented by MATLAB software. If there is no special statement, the default values of the parameters in the software are set for each algorithm. Table 2 shows the average, minimum, and maximum statistical characteristics of 224 network samples in Twitter. In the simulation, the data were preprocessed. To be specific, the nodes with few links were screened out from all the nodes, and they were approximately defined as new users.  Table 3 shows the indexes with specific parameter values. A large number of experimental results show that the algorithm can achieve high precision by setting the parameter value in this way. Table 3. Algorithm abbreviation with parameters.

Algorithm Name Parameters Algorithm Name Parameters Algorithm Name Parameters
CCND2a CCND2d In NB experiments, based on the training set, 27 kinds of SDIs were screened and 14 kinds of SDIs with high performance were selected, which were CND3, CND4, CCND2a, CCND2c, CCND2d, CCND4a, CCND4b, CCND4c, CCND4e, CCND4g, CCND4j, CCND4l, CCND4m, and CCND4n, respectively. For each of the selected SDI, the AdaBoost metalearning strategy was trained by the training set, and then the fully integrated indexes were found. These 14 kinds of algorithms correspond to 14 AdaBoost meta-learning strategies.
In addition, in order to verify the performance of the proposed FILPA model, in the experiment, 10 kinds of classical SLPAs were adopted to compare with CN, Salton, Jaccard, Sorenson, HPI, HDI, LHN, AA, RA, and RAA.
In order to evaluate the predictive performance of the algorithm, 100 experiments were carried out, and the average AUC value was taken as the index to evaluate the performance of the algorithm. The results are shown in Table 4, where FILPAa represents FILPA with different weight combinations, FILPAb represents FILPA without weight combinations, that is, all the weights are 1, and single NB means the direct use of the FISDI selected by the NB for prediction.  Figure 4 shows the performance comparison between single NB and non-combined SDIs. Figure 5 and Table 5 present the performance comparison between FILPA and all reference methods, and Figure 6 and Table 6 Table 5 and Figure 5 show that FILPA demonstrates a significant improvement over the existing excellent algorithms. In other words, FILPA can accurately predict the likelihood that new users will establish friendships with other nodes in the community when marketing the brand products to new users.

Performance Analysis of Algorithms
It can be seen from Table 4 that the performance of FILPAa is better than that of unweighted FILPAb, which indicates that the link prediction effect can be improved by constructing a high-performance fully integrated index with different weights. It can also be seen from Table 4 that the performance of FILPAa and FILPAb is better than that of NB and other algorithms, which shows that the mechanism of selecting FISDI through the AdaBoost meta-learning strategy and further identifying high-performance FISDI can provide better prediction results than the blind combination of SDIs.
OR PEER REVIEW 16 of 19 Figure 6. Performance comparison of newly defined SDIs and original SLPA. Table 5 and Figure 5 show that FILPA demonstrates a significant improvement over the existing excellent algorithms. In other words, FILPA can accurately predict the likelihood that new users will establish friendships with other nodes in the community when marketing the brand products to new users.

Performance Analysis of Algorithms
It can be seen from Table 4 that the performance of FILPAa is better than that of unweighted FILPAb, which indicates that the link prediction effect can be improved by constructing a high-performance fully integrated index with different weights. It can also be seen from Table 4 that the performance of FILPAa and FILPAb is better than that of NB and other algorithms, which shows that the mechanism of selecting FISDI through the AdaBoost meta-learning strategy and further identifying high-performance FISDI can provide better prediction results than the blind combination of SDIs.
It can also be seen from Figure 6 and Table 6 that the performance of the new indicator SDI developed by integrating the characteristic information of social distance into the original SLPA is significantly higher than that of the original SLPA itself, which suggests that using rich information such as the social distance of related nodes within a large range to predict the likelihood of social relationship generation can overcome the problem that algorithms based on a traditional local triadic closure structure cannot solve the problem of a sparse network due to the scarce information of local common neighbors of new users, and cold start. In addition, it can be concluded from Figure 4 that the accuracy of NB is higher than other non-combined SDIs, which means that the mechanism of selecting the best SDIs based on the density and social distance characteristics of common neighbors in a local network is effective.
Finally, paired samples of AUC of any two algorithms in 100 experiments were collected, and Z-test was used to determine whether the mean values of the two samples were statistically equal. Subsequently, it is found that all P values are less than 5% of the   It can also be seen from Figure 6 and Table 6 that the performance of the new indicator SDI developed by integrating the characteristic information of social distance into the original SLPA is significantly higher than that of the original SLPA itself, which suggests that using rich information such as the social distance of related nodes within a large range to predict the likelihood of social relationship generation can overcome the problem that algorithms based on a traditional local triadic closure structure cannot solve the problem of a sparse network due to the scarce information of local common neighbors of new users, and cold start. In addition, it can be concluded from Figure 4 that the accuracy of NB is higher than other non-combined SDIs, which means that the mechanism of selecting the best SDIs based on the density and social distance characteristics of common neighbors in a local network is effective.
Finally, paired samples of AUC of any two algorithms in 100 experiments were collected, and Z-test was used to determine whether the mean values of the two samples were statistically equal. Subsequently, it is found that all P values are less than 5% of the significance level, which indicates that there is a significant difference between the algorithms. These results show that FILPA can effectively overcome the cold start problem faced by new users in the prediction of their friendships, accurately recommend other nodes in the community to new users, and effectively realize brand product marketing to new users.

Conclusions
By leveraging the influence of a friend relationship between new users and existing users in a brand community, new users will be attracted to join that community to purchase the brand's products, which will be of benefit to realizing the "products rapidly attracting users" by expanding the scale of the brand community. Therefore, this paper proposes FILPA for predicting possible friendships between new users and existing users in a brand community.
The traditional link prediction algorithm derived from a local triadic closure structure cannot overcome the problems of sparse network and cold start caused by the scarcity of information related to new users' local common neighbors. Consequently, this paper proposes FILPA as a method for solving these problems by measuring the social distance of relevant nodes in an expanse region. Compared with the existing algorithms, the distinctive characteristics of this algorithm are as follows: Firstly, the traditional link prediction method based on triadic closure structure only focuses on local information, while the SDI proposed in this paper takes the global information into account, which enriches the available information when predicting the connection between new users and existing users in the brand community.
Secondly, the algorithm adopts a naive Bayes (NB) method to adaptively select the appropriate SDI according to network density and social distance. In contrast with the traditional fixed link prediction algorithm, it can not only choose the appropriate SDI in different network structures more efficiently, but can also improve the prediction accuracy.
Thirdly, the existing single index prediction method can only describe the characteristics of social distance from one perspective, and blind combination of single index can not guarantee the comprehensive characterization of network information. In order to adequately describe the characteristics of implicit social distance, FILPA applies an AdaBoost meta-learning strategy to identify the FISDI composed of the best SDI screened by NB, and then explores the possibility of users becoming friends.
Twitter data was used to verify the performance of FILPA. The experiment results on 224 communities in the Twitter network show that the five kinds of SDIs in micro network structure are better than the method of using the original SLPA, and macro-based SDI has significantly high accuracy in contrast to an SLPA based on triadic closure structure. In addition, compared with the existing 10 classical algorithms, FILPA has obviously superior results. Therefore, FILPA can overcome the cold start problem faced by predicting the friendships of new users. By predicting the possibility of users becoming friends through FILPA, marketers can accurately recommend friends to new users in social networks and realize brand product marketing to new users, which will assist brand communities in quickly growing their customer bases, promote word-of-mouth, increase user stickiness, and realize successful product marketing in the embryonic period when the number of product users is still small. Additionally, the cost of marketing can be reduced by the way of attracting customers recommended by friends. This paper has some limitations. First, it only takes link prediction in a static network into consideration, and we will consider link prediction in a dynamic network in the future. Second, this paper only considers the social relations of nodes in homogeneous networks, and does not consider the real-time interactive information in the network. Both social contact and real-time interactive information of the network will be considered in the future when predicting friendships in the networks.