A Novel Centrality for Finding Key Persons in a Social Network by the Bi-Directional Influence Map

Symmetry is one of the important properties of Social networks to indicate the co-existence relationship between two persons, e.g., friendship or kinship. Centrality is an index to measure the importance of vertices/persons within a social network. Many kinds of centrality indices have been proposed to find prominent vertices, such as the eigenvector centrality and PageRank algorithm. PageRank-based algorithms are the most popular approaches to handle this task, since they are more suitable for directed networks, which are common situations in social media. However, the realistic problem in social networks is that the process to find true important persons is very complicated, since we should consider both how the influence of a vertex affects others and how many others follow a given vertex. However, past PageRank-based algorithms can only reflect the importance on the one side and ignore the influence on the other side. In addition, past algorithms only view the transition from one status to the next status as a linear process without considering more complicated situations. In this paper, we develop a novel centrality to find key persons within a social network by a proposed synthesized index which accounts for both the inflow and outflow matrices of a vertex. Besides, we propose different transition functions to represent the relationship from status to status. The empirical studies compare the proposed algorithms with the conventional algorithms and show the differences and flexibility of the proposed algorithm.


Introduction
Key person identification within a social network means to find persons who can change the feelings, attitudes, or behaviors of other persons though network relationships [1] and, therefore, this is a critical issue in the fields of viral marketing [2], spread of opinions [3], rumor restraint [3], and innovation dissemination [4]. As we know, one of the major properties within social networks is its symmetry between nodes. Many algorithms have been proposed to identify important persons within a social network based on the concept of vertex centralities.
Vertex centrality measures the importance of persons within a network according to their position relative to others. These measures can be divided into local measures, short path-based measures, and iterative calculation-based measures [5]. The most famous local measure is degree centrality, which is used [6,7] to identify the most influential persons within a social network. However, it only reflects the influence of an ego's neighbors and ignores the influence of further persons [8] Note that an ego is the vertex which we focus on within a social network.
By contrast, short path-based measures calculate the influence of an ego by considering the shortest paths between any two vertices. These measures include closeness, betweenness, and Katz

Introduction of Centralities
The most common centralities in social network analysis are the degree, closeness, and betweenness to account for key persons. The degree centrality is defined by the number of direct neighbors as an indicator of the influence of a network member's interconnectedness (Nieminen, 1974). Let a network represented by a graph G(V,E), where V and E denote the sets of vertices and edges, respectively. Then, the degree of the i-th vertex, v i , can be represented as follows: where deg(·) is the degree of the vertex. If the graph is directed, we should account for the inand out-degrees separately. In-degree centrality measures the popularity/prestige of a person and out-degree, by contrast, accounts for the sociality of a user [20,21]. Next, the thought of the closeness centrality is that a vertex that is closer to others can spread information very productively via the network [22], and therefore it is important. The closeness centrality can be measured by the sum of the vertex's distance from all others: where d(·) denotes the distance between vertices. Finally, we can define the betweenness centrality as a bridge along the shortest path between two vertices as follows: where σ v j v k denotes the number of shortest paths from vertex v j to v k and σ v j v k (v i ) is the number of shortest paths from v j to v k that pass through vi. Although the previous three centralities are easily calculated, they only reflect the influence of vertices with respect to others in the topology of a social network without considering the influence of their neighbors/friends and cannot be used as a comprehensive centrality for measuring key persons. Hence, the eigenvector centrality is proposed to reflect the importance of neighbors.

Eigenvector Centrality
First, we assume that the importance of a vertex within an undirected network is only determined by the influence of others and that a vertex achieves more centrality if it receives more in-degree flows from others. Hence, let n criteria be considered to determine their weights. The eigenvector centrality of the ith ego can be represented as follows: where A j,i is the element at the jth row and ith column of the adjacency matrix which indicates the relationship from one vertex (row) to another (column) and λ is a fixed constant. For simplicity, we can represent Equation (4) as a matrix form: Equation (5) indicates the eigenvector centrality vector, EC(v), which is an eigenvector of A T and λ is the corresponding eigenvalue. Note that the initial EC(v) can be set as 1, i.e., the all-one vector. Usually, we select the maximum eigenvalue, λ max , to ensure EC(v) is large than the zero vector. According to the Perron-Frobenius theorem [23], for any a ij > 0, EC(v) of A with eigenvalue λ max such that ∀EC(v j ) > 0.
We can also let A T be a row stochastic matrix, i.e., normalized A T such that all sums of each row exactly equal to one. We can rewrite Equation (5) as follows: The advantage of Equation (6) is that the eigenvector can be easily be derived since λ max = 1. In addition, we can also derive the eigenvector by calculating the limiting power of A T according to Markov chain theory.

Katz Centrality
The problem of the eigenvector centrality is that it only suits undirected graphs. In addition, Equation (6) is not always reasonable if a vertex only influences others, i.e., a vertex with no in-degree, and, therefore, its centrality becomes zero, even if it might play an important role in affecting others. Hence, we can introduce a constant, β, to Equation (7) to reflect the extent to which the weight of the Symmetry 2020, 12, 1747 4 of 18 centrality of an ego is tied to and add the scaling constant, α, to normalize the score. The method is called the Katz centrality [24]. Hence, we can re-write Equation (6) as follows: where α < 1 λ max to ensure a reliable result. Or we can present Equation (7) as the matrix form: Then, we can re-formulate Equation (8) and derive the Katz centrality vector by: where 1 denotes the one vector.

PageRank Algorithm
Although Katz centrality extends eigenvector centrality to account for direct networks, it assumes that the all vertices will pass their flows into the ith vertex. However, if a vertex does not want to pass all of its flow to others, we should restrict the linked vertex so it only gets a fraction of flow from others. Hence, we can use the PageRank algorithm and re-write Equation (9) to represent the above description as follows [25]: where α is the damping factor and d out j denotes the out-degree of the jth vertex. The reason we divide A j,i over the out-degree is to normalize the adjacency matrix into the stochastic matrix. In addition, we can re-write Equation (10) as the matrix form as follows: where D = diag(d out 1 , d out 2 , . . . , d out n ) denotes a fixed out-degree matrix and A T D −1 is a column stochastic matrix. Note that the since A T D −1 is a column stochastic matrix, the damping factor α should be less than one to ensure that (I − α A T D −1 ) is invertible. Although many variants of the PageRank have been proposed successively, the cores of the algorithms are similar.

HITS Algorithm
Hypertext-induced topic search (HITS) is another popular algorithm that has been proposed by Kleinberg [26] to rank web pages. The major characteristic of HITS is that it divides the influence of a vertex into the authority and hub, where the authority measures the degree that other vertices point to an ego and the hub reflects the degree that an ego points outward to others. The concept of the authority and hub can be represented, respectively, as follows: where v j → v 1 indicates that vertex v j points to v i . We can calculate the score vectors of the authority and hub of vertices, respectively, as follows: where A T A and AA T are called authority and hub matrices, respectively, and c(t) and c (t) are constants which normalize the authority and hub score vectors. From Equations (14) and (15), it can be seen that the HITS algorithm is used to calculate the eigenvectors of A T A and AA T . The HITS algorithm highlights that the centrality of a vertex should consider two different forces, namely, the authority and hub. However, it only proposes indices to measure the centralities of the authority and hub separately without a synthesized centrality.

Fuzzy Cognitive Map
The fuzzy cognitive map (FCM) approach was proposed by Kosko [27] to extend cognitive maps [28] by considering the fuzzy degrees of interrelationship between concepts. The FCM is used to reflect the influence of vertices, called concepts here, to others via cause-effect relationships, which are quantified and usually normalized to the [−1, 1] interval.
Let w ij ∈ [−1, 1] be the degree of influence from the ith concept, C i , to the jth concept, C j , where the sign indicates the positive or negative influence and −1 denotes a full negative impact and 1 expresses a full positive impact. Then, the influence of concept, x, can be calculated by the following equation: where n is the number of concepts and f (·) denotes the transfer function to squeeze the result of the multiplication into a specific range, e.g., [0, 1] or [−1, 1]. Usually, bivalent, trivalent, and sigmoid functions are used in the FCM. A modified Kosko's version of the FCM was proposed by Stylios and Groumpos [29] to consider the previous value of each concept, i.e., observing the self-loop effect. Hence, Equation (16) can be modified and extended as follows: Another variant of the FCM is that which is used for rescale inference and is presented as follows: The positive and negative influences in the FCM indicate that the centrality of a vertex within a graph should consider two opposite forces to aggregate the final centrality.
Let us consider an example to illustrate the above concept. Assume a graph is given as shown in Figure 1.
The influences between the concepts are quantified by an expert and are shown in Table 1. In order to show the different centralities of positive and negative influences, we consider all positive and negative influences, respectively, in the influence matrix. Then, we use the modified Kosko's inference rule and logistic function to derive the influences and ranks of the concepts, as shown in Table 2.
The positive and negative influences in the FCM indicate that the centrality of a vertex within a graph should consider two opposite forces to aggregate the final centrality.
Let us consider an example to illustrate the above concept. Assume a graph is given as shown in Figure 1. The influences between the concepts are quantified by an expert and are shown in Table 1. Table 1. Influence matrix of the concepts.
In order to show the different centralities of positive and negative influences, we consider all positive and negative influences, respectively, in the influence matrix. Then, we use the modified    Table 2 shows that the positive and negative influences from concept to concept exert two opposite forces on the synthesized centrality of a concept in the FCM. In addition, the transform functions squash the influence of vertices into a specific range with the nonlinear function. However, the FCM is not adequate in handing the problem directly here because the influence matrix in a social network is usually unavailable. The only information we can get is the graph of a social network, i.e., the adjacency matrix. In addition, negative influences between vertices are not considered due to the lack of the information. However, the concept of transform functions could be incorporated into the proposed algorithm.

Bi-Directional Influence Maps (BIM)
After viewing the previous research, we can conclude that two types of importance of a node should be identified, namely, the authority and hub [16]. An authority can be defined by other vertices inflow to an ego and a hub can be measured by the total outflow points to others. However, previous models only focus on the one side and we can incorporate both forms of information to form the centrality of a vertex here as follows: whereÂ in j,i andÂ out j,i denote the inflow and outflow influence matrices which have been normalized to stochastic matrices and γ and (1 − γ) denote the weights of the authority and hub. Note that we use the influence matrix above rather than the adjacency matrix because we will consider another method to modify the conventional adjacency matrix for more rational results. This is described in detail below.
Assume that a network structure is depicted as shown in Figure 2, where the circles denote different vertices, and values q ri denote the flows from vertex r to vertex i. We consider the centrality of a vertex in terms of two factors, namely, the amounts of inflow and outflow. In addition, we also define the reference of vertex i, R, as the vertex which link to vertex i (e.g., vertices r and s). For example, in the path from r to i, denoted by r → i, vertex r is the reference of vertex i.
should be identified, namely, the authority and hub [16]. An authority can be defined by other vertices inflow to an ego and a hub can be measured by the total outflow points to others. However, previous models only focus on the one side and we can incorporate both forms of information to form the centrality of a vertex here as follows: where , and , denote the inflow and outflow influence matrices which have been normalized to stochastic matrices and γ and (1 − γ) denote the weights of the authority and hub. Note that we use the influence matrix above rather than the adjacency matrix because we will consider another method to modify the conventional adjacency matrix for more rational results. This is described in detail below. Assume that a network structure is depicted as shown in Figure 2, where the circles denote different vertices, and values qri denote the flows from vertex r to vertex i. We consider the centrality of a vertex in terms of two factors, namely, the amounts of inflow and outflow. In addition, we also define the reference of vertex i, R, as the vertex which link to vertex i (e.g., vertices r and s). For example, in the path from r to i, denoted by r → i, vertex r is the reference of vertex i.  The inflow from vertex r to vertex i at time t in this paper can be defined by the following equation: whereÂ in j,i ∈ (0, 1] and I i indicates the input degree (number of inflows) of vertex i. Then, the outflow vertex from vertex i to vertex j at time t can be calculated as follows here: whereÂ out j,i ∈ (0, 1] and O i indicates the output degree (number of outflows) of vertex i. After obtaining the above indices, we can construct the inflow and outflow matrices, respectively, as follows: where we reflect the influence of the feedback flows in the inflow and outflow matrices. We use the following example to demonstrate the indices defined here. This example has six vertices that contain directed and feedback links between vertices, as shown in Figure 3.
where we reflect the influence of the feedback flows in the inflow and outflow matrices. We use the following example to demonstrate the indices defined here. This example has six vertices that contain directed and feedback links between vertices, as shown in Figure 3. The inflow and outflow matrices can be derived, respectively, as follows: Next, if we consider the update process from one status to the next status, this can be represented by a transition function. We propose the final model here as follows: where f (·) is a transition function, e.g., a sigmoid or linear function. Sigmoid functions, e.g., logistic or hyperbolic-tangent functions, are widely used in many methods, e.g., neural networks and fuzzy cognitive maps, to squash values into a specific range. For example, the logistic function can squash any real number into (0, 1) and the hyperbolic-tangent function can squash a real number into (−1, 1). Sigmoid functions are popular because they can reflect situations of the real world. However, conventional sigmoid functions are not suitable here, since the centrality of a vertex is always positive and falls into the range of [0, 1] instead of a real number. Note that M(v i (t + 1)) in Equation (22) satisfies M(v i (t + 1)) ∈ (0, 1) and n i=1 M(v i (t + 1)) = 1. Hence, in this paper, we introduce two sigmoid functions, namely, the smoothstep and inverted smoothstep functions, to reflect the s-shape situation of updated centralities and restrict the input range between [0, 1], as shown in Figure 4.
function can squash a real number into (−1, 1). Sigmoid functions are popular because they can reflect situations of the real world. However, conventional sigmoid functions are not suitable here, since the centrality of a vertex is always positive and falls into the range of [0, 1] instead of a real number. Note that M(vi(t + 1)) in Equation (22) satisfies M(vi(t + 1)) ∈ (0, 1) and ∑ ( + 1) = 1 . Hence, in this paper, we introduce two sigmoid functions, namely, the smoothstep and inverted smoothstep functions, to reflect the s-shape situation of updated centralities and restrict the input range between [0, 1], as shown in Figure 4. In addition, we also consider the softmax and restricted logistic functions to see the distinct from the linear function. The transition functions used in this paper are summarized in Table 3   Table 3. Transition functions and mathematical equations.

Transition Function Mathematical Equation Linear
( ) f x x =  In addition, we also consider the softmax and restricted logistic functions to see the distinct from the linear function. The transition functions used in this paper are summarized in Table 3. Table 3. Transition functions and mathematical equations.

Transition Function
Mathematical Equation Linear Next, for simplicity, we can let the following be true: SinceÂ in j,i andÂ out j,i are column stochastic matrices, we can ensure the linear combination of two column stochastic matrices, A in j,i , is also a column stochastic matrix. Then, we can rewrite Equation (22) as follows: The matrix form is written as follows: where we set α = 0.85 as the suggestion of the PageRank algorithm. Note that in large-scale networks, we can first set A T = softmax(A T ) to avoid the convergent problem of the proposed algorithm before processing in Equation (24). We can highlight the difference between the PageRank and bi-directional influence map (BIM) algorithms as follows. First, the PageRank algorithm considers the importance of a vertex as the all paths from others, i.e., the inflow matrix. However, the BIM algorithm considers the inflow and outflow matrices to balance the influences of both powers. In addition, taking the above graph as example, the inflow matrices of two algorithms (A D, PageRank algorithm; (Â in ) T , BIM algorithm) also show distinct differences: 0.000 0.500 0.000 0.500 0.000 0.000 0.000 0.000 0.000 0.000 0.500 0.000 0.333 0.000 0.000 0.000 0.000 0.500 0.333 0.000 0.500 0.000 0.000 0.000 0.333 0.500 0.000 0.000 0.000 0.500 0.000 0.000 0.500 0.500 0.500 0.000 Next, we can use the proposed algorithm to handle the example in Figure 3 and rank the vertices with different settings of the parameters, i.e., transition functions and γ, and compare the results with the PageRank algorithm, as shown in Table 4. The results of Table 5 indicate that the weights between the inflow and outflow matrices play different forces in regard to affecting the importance of a vertex. If we only consider the inflow matrix, the ranking result is similar to that of the PageRank algorithm. By contrast, if only the outflow matrix is used, the ranking result is just the reverse of that of the PageRank algorithm. However, we think both forms of information should be considered to be the centrality of a vertex. In addition, the different functions here show the consistent results and indicates the robustness of the proposed algorithm.
Next, we can examine the transition functions to understand the convergence and influence on the centralities and ranking of vertices. First, the centralities of the vertices in each iteration are normalized to the sum of them as one. Then, we can depict the iterative processes of all transition functions here with γ = 0.5, as shown in Figure 5.  The convergence status of the proposed model is very quick, no matter which transition functions are considered, and the softmax and inverted smoothstep functions seem to be better choices here because the centralities of the vertices are more significantly different than those of others. By contrast, the restricted sigmoid and smoothstep functions find it hard to clearly separate the centralities. We should highlight that although the linear function also shows an acceptable property, the different transition functions do play an important role to determine the ranking of vertices and might not be the same. The convergence status of the proposed model is very quick, no matter which transition functions are considered, and the softmax and inverted smoothstep functions seem to be better choices here because the centralities of the vertices are more significantly different than those of others. By contrast, the restricted sigmoid and smoothstep functions find it hard to clearly separate the centralities. We should highlight that although the linear function also shows an acceptable property, the different transition functions do play an important role to determine the ranking of vertices and might not be the same.

Empirical Studies
In the empirical studies detailed below, we prepare two datasets to demonstrate the proposed algorithm and compare the results with the eigenvector centrality and PageRank algorithms.

Marvel Universe Dataset
The first dataset used here is the Marvel universe character network, which was proposed by Alberich in 2002 to investigate the structure of the collaboration network. Two Marvel characters are considered linked if they join in the same comic book or movie. There are 7219 characters and 574,467 edges with 50 connected components in this graph. We can plot the giant connected component (GCC) to see the main structure of the social network, as shown in Figure 6.

Marvel Universe Dataset
The first dataset used here is the Marvel universe character network, which was proposed by Alberich in 2002 to investigate the structure of the collaboration network. Two Marvel characters are considered linked if they join in the same comic book or movie. There are 7219 characters and 574,467 edges with 50 connected components in this graph. We can plot the giant connected component (GCC) to see the main structure of the social network, as shown in Figure 6.   Figure 7 shows a less significant shape for the power law relationship. Next, we can calculate the descriptive statistics of the network to understand the insight of the network, as shown in Table  5.  Then, we can depict the degree distribution of the graph to see if the power law relationship of the small world principle is satisfied, as shown in Figure 7.
The first dataset used here is the Marvel universe character network, which was proposed by Alberich in 2002 to investigate the structure of the collaboration network. Two Marvel characters are considered linked if they join in the same comic book or movie. There are 7219 characters and 574,467 edges with 50 connected components in this graph. We can plot the giant connected component (GCC) to see the main structure of the social network, as shown in Figure 6.   Figure 7 shows a less significant shape for the power law relationship. Next, we can calculate the descriptive statistics of the network to understand the insight of the network, as shown in Table  5.   Figure 7 shows a less significant shape for the power law relationship. Next, we can calculate the descriptive statistics of the network to understand the insight of the network, as shown in Table 5.
To analyze and calculate the centrality of vertices within the network, we first simplified the network to avoid loops and multiple edges and then set different parameters in our algorithm to see the variety of the ranking results. The ranking results of the PageRank algorithm are also presented to show the differences among the algorithms, shown in Table 6. Next, we can check the convergent status of the centralities derived by the proposed algorithm to understand the robustness of the method. Taking the example of the BIM (softmax, γ = 0.1) model, we can depict the convergent status of the top 5 centralities, as shown in Figure 8. Next, we can check the convergent status of the centralities derived by the proposed algorithm to understand the robustness of the method. Taking the example of the BIM (softmax, γ = 0.1) model, we can depict the convergent status of the top 5 centralities, as shown in Figure 8. The above result indicates that the proposed algorithm quickly converges. The above result indicates that the proposed algorithm quickly converges.

Facebook Dataset
The Facebook dataset demonstrated here was provided by Sharma in Kaggle (https://www.kaggle. com/sheenabatra/facebook-data). The graph contains 4039 nodes and 88,234 edges with an average degree of 43.6910. The task here is to find the top 10 key persons within the social network by the proposed algorithm and compare the results with other conventional algorithms. First, we can depict the social network as shown in Figure 9.

Facebook Dataset
The Facebook dataset demonstrated here was provided by Sharma in Kaggle (https://www.kaggle.com/sheenabatra/facebook-data). The graph contains 4039 nodes and 88,234 edges with an average degree of 43.6910. The task here is to find the top 10 key persons within the social network by the proposed algorithm and compare the results with other conventional algorithms. First, we can depict the social network as shown in Figure 9. The social network indicates several main subgroups and only one connected component can be identified. Next, we can depict the degree distribution of the graph to see if the graph satisfies the power law principle, as shown in Figure 10. The graph shows a slight shape of the power law principle to justify the dataset, and as such can be viewed as a small-world network. The social network indicates several main subgroups and only one connected component can be identified. Next, we can depict the degree distribution of the graph to see if the graph satisfies the power law principle, as shown in Figure 10. The graph shows a slight shape of the power law principle to justify the dataset, and as such can be viewed as a small-world network. The social network indicates several main subgroups and only one connected component can be identified. Next, we can depict the degree distribution of the graph to see if the graph satisfies the power law principle, as shown in Figure 10. The graph shows a slight shape of the power law principle to justify the dataset, and as such can be viewed as a small-world network. Then, we can calculate the descriptive statistics of the graph to understand the basic insight of the network, as shown in Table 7.  Then, we can calculate the descriptive statistics of the graph to understand the basic insight of the network, as shown in Table 7. Here, we select several conventional centralities which are commonly used for directed networks to determine the top 10 key persons within the Facebook social network. Next, we set our algorithm with three different functions, namely, linear, softmax, and smoothstep, and three different values of γ, namely, 1, 0.5, and 0, respectively. We retrieved the top 10 persons and normalized their centralities to see the transition changes of the centralities. Taking the linear, softmax, and inverted smoothstep with γ = 0.5 as examples, we can depict the transition changes of the centralities, as shown in Figure 11. Here, we select several conventional centralities which are commonly used for directed networks to determine the top 10 key persons within the Facebook social network. Next, we set our algorithm with three different functions, namely, linear, softmax, and smoothstep, and three different values of γ, namely, 1, 0.5, and 0, respectively. We retrieved the top 10 persons and normalized their centralities to see the transition changes of the centralities. Taking the linear, softmax, and inverted smoothstep with γ = 0.5 as examples, we can depict the transition changes of the centralities, as shown in Figure 11.

Linear function
Softmax function Inverted smoothstep function The result of Figure 11 shows the good convergence of the top 10 centralities. Finally, we can derive the top 10 key persons within the Facebook social network by the popular centrality algorithms which are used for the directed graph and compare the results with the proposed algorithm (i.e., the BIM algorithm), shown in Table 8. Note that in this experiment we only consider the linear, softmax, and inverted smoothstep functions, since they can derive more distinct centralities of the vertices.  The result of Figure 11 shows the good convergence of the top 10 centralities. Finally, we can derive the top 10 key persons within the Facebook social network by the popular centrality algorithms which are used for the directed graph and compare the results with the proposed algorithm (i.e., the BIM algorithm), shown in Table 8. Note that in this experiment we only consider the linear, softmax, and inverted smoothstep functions, since they can derive more distinct centralities of the vertices.
The empirical results show several advantages of the proposed algorithms. First, the proposed algorithm can derive a synthesized centrality which can reflect different perspectives of a key vertex based on the results of the Facebook network. Second, the results of the Marvel universe network are consistent with the results of the PageRank algorithm and the top five key characters are rational, even when considering different parameters. Third, the transition functions show the usefulness and diversity to find key persons within a social network. Finally, the proposed algorithm shows a good property to converge under an acceptable number of iterations.
The limitations of the algorithm can be described as follows. The social networks used in this paper are artificial datasets. Although the results found in the Marvel universe network seem to be reasonable, it is hard to confirm if it is also useful for application to real data. Hence, further research may consider a real and large network to carefully test the proposed algorithm with different parameters. In addition, the proposed algorithm can also be used to compare some new centrality measures, e.g., Rodríguez-Velázquez & Balaban [35].

Conclusions
In this paper, we have proposed a new algorithm to calculate the centrality of a vertex based on the in-flow from others and out-flow to others of an ego to obtain a synthesized index. In addition, we also have incorporated non-linear transition functions to account for complicated social relationships. The empirical studies here show that the proposed algorithms are more flexible and comprehensive than others, justifying the usefulness of the proposed algorithm. Besides, the convergence status of the centralities can reflect the robust results of the proposed algorithm.