In the section, we describe the methods and algorithms used to detect Network Leaders in an enterprise social network (ESN).
In the next sections, we present our model of leadership centrality and brokerage.
5.2. A Multiplex Model of Leadership Centrality
In the previous sections, we justified our choice of preferring an eigenvector model in the style of PageRank to measure the centrality of network users. However, we are still faced with the problem of designing an appropriate model of users’ interactions in the network; in other words, we need to formalize the notion of “bonding” in an ESN, such as TamTamy. Once more, we rely also on sociological literature to identify social behaviors that may indicate leadership.
We illustrate our bonding model with the help of
Figure 3. As explained in previous
Section 4, co-workers play three roles:
author,
commenter, or
rater. In agreement with Reference [
15], the action of starting (authoring) a thread can be considered as an
empowerment action, since authors are willing to share knowledge and solicit the collaboration of colleagues. As also remarked in Reference [
47], “empowerment is to start a conversation”. In organizational conversations, when talking
with employees rather than
to them, the achieved interactivity makes the conversation open rather than directive [
47] and enables participants to put their own ideas.
Note that, in TamTamy, starting a thread is not a role obligation, except for content publishers who are a small minority. Furthermore, messages whose purpose is simply to broadcast information, rather than soliciting contributions, do not receive comments. Restricting to threads with at least one comment (referred to hereafter as
non-zero threads) provides a reasonable support to the hypothesis that the author’s motivation was indeed to involve other partners on specific aspects of their working life. Furthermore, as demonstrated in
Section 5.4, on topic analysis, the large majority of non-zero threads actually includes discussions on a variety of technical and organizational issues.
The interpretation in terms of leadership qualities of the other roles in
Figure 3 is more straightforward. Commenters show their willingness and ability to collaborate. In fact,
collaborative leaders [
48] do not exert their role just deciding what to do: rather, they help and encourage the collaborative process work. Finally, raters express their trust (or distrust), thus acknowledging/questioning the competence of commenters and the authority of authors. As remarked in Reference [
49],
trustworthy leaders are “rewarded by employees who stretch, push their limits, and volunteer to go above and beyond”.
To represent the mutual reinforcement among an actor’s
empowerment, collaboration, and
trust centrality, we use a three-way
multiplex network. Multiplex networks [
50] are a class of networks recently introduced to model systems in which the same set of nodes are connected via more than one type of link. They are a special case of multilayer networks, in which no links are allowed among different layers.
In our multiplex network, the first layer,
, represents the activity of authors; the second layer,
, models the activity of commenters; and the third layer,
, is that of raters (see the small artificial example of previous
Figure 1, depicting a set of three threads and the corresponding multiplex network’s layers).
In the first layer,
, for any thread
initiated by any author
i, we add an edge
with
, whenever
j posts a comment in
. This models the fact that author
i has empowered user
j. The
cumulative empowering weight of an edge
is computed as:
The adjacency matrix E (empowerment matrix) associated with is left stochastic, with each column summing to 1.
Edges
in
are created whenever a user
i posts a comment for the first time in a thread
authored by
j. Subsequent answers update the weight of
. Given the sequential thread structure, comments are considered as implicitly directed to the author, unless the message includes the name of another recipient in the thread (as in the example of
Figure 2). Note that authors themselves can add comments during a discussion, and, in this case, they are treated as commenters. Any comment
h in a thread
has a weight computed as follows:
where
k is the sequential order of the comment in the thread (
if a user is the first who replied). This models the idea that the first commenter is more competent, or more willing, to collaborate, than others (alternatively,
k can be set to represent the difference between the timestamp of the author’s message and that of the commenter). In the formula,
are parameters that we experimentally set to 0.5. The
cumulative collaboration weight of an edge
sums all answers provided by user
i to user
j in any thread within a considered timespan
W:
The collaboration matrix C (the adjacency matrix associated with ) is, as for E, left stochastic.
The third layer
models the rating activity of users. Edges
are weighted with the trust
of rater
i in user
j. We first introduce the quantity:
where
is the set of messages generated by user
j (either as an author or as a commenter),
h is a message in
, and
if
i likes
h,
if
i dislikes, and 0 if no opinion is expressed. As for Formula (
9), parameters
and
are set to 0.5. According to Equation (
7),
indicates distrust. We then define:
The T matrix (the adjacency matrix associated with ), named the credibility matrix, is right stochastic.
We further denote with the 3-way tensor of the multiplex network. The third dimension of the network represents the network leadership qualities discussed at the beginning of the section: empowerment, collaboration, and credibility. We now need to measure the centrality of nodes in the multiplex network, to identify highly central agents.
A simple assumption would be to compute, for every user, the empowerment, collaboration, and credibility ranks
,
, and
using monoplex PageRank [
22], and then computing some heuristic function to combine these indicators, for example, using a regression, as in Reference [
9]. However, a better assumption is to postulate a mutual reinforcement relation among layers, i.e., that the centrality of each node in one layer affects the centrality of the same node in any other layer. Therefore, to compute our measure of
network leadership centrality, we use
Multiple PageRank (MPR), introduced in Reference [
43].
First, we note that PageRank centrality depends on the in-degree of nodes, while, according to our formulation, empowerment and collaboration of a node depend on the weight of outgoing edges. Therefore, we need to invert the direction of edges in the corresponding graphs. We denote with , (Where denotes the transpose), and T the slices of an tensor; and, with , , and , the corresponding matrix cells.
Our formulation of MPR follows the interaction model of
Figure 3. In the
credibility layer, we have:
which corresponds to the standard monoplex PageRank formulation with teleporting, in fact, as shown in
Figure 3, the activity of raters is not influenced by the other layers. Note that this is not true, in general, since a rater might be influenced by the role of the rated node (an employee might be more prone to place a “like” on his/her boss thread or comment); however, we here assume objectivity of raters.
In each of the other two
empowerment and
collaboration layers, in analogy with Reference [
43] (in Halu et al.’s formulation, any layer
k is impacted only by the previous
layer), we include the multiplicative and additive effect of the other two layers, as follows:
where the symbol
indicates the average operator, and the exponents
are set to tune the influence of one layer on the others. Equations (
8) and (
9) can be interpreted as follows: the first term of the equations shows that the rank of a node
i is determined by the discounted rank of its neighbors in the same layer (as in the original PageRank formulation) multiplied by its average rank in the other two layers
, powered by a factor
. The second term reflects the contribution to node
i’s rank deriving from its average importance in layers
, powered by a factor
. By adding the second term, nodes that are not able to attract important nodes in one layer can still gain importance by virtue of their centrality in other layers.
The second term represents the multiplex formulation of teleporting, where the teleporting factor is layer-dependent. It is shown that, since in a multiplex network layers are not connected (contrary to the more general category of multilayer networks), teleporting does not allow a random walker to jump from one node of a layer to another node of another layer, though the probability of the destination node is influenced by its rank in the other layers.
The linear system of Equations (
7)–(
9) can be computed using the stationary iterative method:
where
is the rank vector in iteration
t, and
A is the matrix of coefficients of the linear system. We note that the three matrices
,
, and
T are stochastic, which is a necessary, but not sufficient, condition for convergence. However, for monoplex PageRank, teleporting is used to force primitivity of the original stochastic matrix, which ensures convergence of the iterative method according to
Perron Frobenius theorem (
https://www2.warwick.ac.uk/fac/sci/maths/people/staff/oleg_zaboronski/fm/pf_theory.pdf, accessed on 1 September 2021), thus deriving the three
rank vectors
r.
To efficiently calculate stationary values, an iterative
“divide et impera” strategy is adopted (details are omitted for brevity, and interested readers should refer to References [
43,
51]), in line with References [
43,
51]. Efficiency is crucial since, even if the size of an enterprise network is by far smaller than that of a world-wide network, we are interested in generating a real-time ranking of users according to variable parameters, as mentioned later in
Section 6.
Stationary values for Equations (
7)–(
9) are combined in a balanced way to compute the
leadership centrality rank ((
); see Equation (
1)) for all co-workers.
5.4. Topic Leaders
The previously described leadership measures can be applied to an enterprise network in general; however, it may be of interest to assess leadership with reference to specific topics. A user may be highly credible when he/she discusses about, e.g., mobile apps, and be much less confident on business models. Consequently, network leadership should be analyzed also in the context of users’ competence. To this end, our aim is to extract topic networks, i.e., networks of users focused on specific topics.
In TamTamy, the content of threads, besides their multilinguality and mixed-linguality, greatly differ also in the type of discussed topics. A few threads are on leisure topics (for example, the organization of a football match); however, the majority is on technical or administrative topics. We perform topic extraction in two steps: first, by learning relevant terminology from threads; and, second, by generating clusters of co-occurring terms. Finally, we also classify topics in more general categories.
A common topic learning approach in literature is to use stemmed words as items, and then to cluster items using a latent topic model (such as Latent Dirichlet Allocation (LDA) [
53] or one of its many variants). This solution turned out to perform poorly due both to mixed linguality and to the reduced dimension of messages. To extract more meaningful terms (with reference to the company’s competences and topics), first, we index only “content” tokens, or
concepts, identified as those words mapping with BabelNet [
54], a freely available semantic network covering more than 50 languages and more than 13 million concepts. In this way, the specific language in which a concept is expressed does not matter. Then, we extract concept n-grams that are either consecutive (i.e., compounds) or separated by prepositions and determiners. For example, in the sentence: “
This technology has been around for over three years and has been used in Macy’s for marketing purposes for months now”, only the bold tokens are indexed.
Finally, we extract
concept cliques using the Bron-Kerbosch clique detection algorithm, as described in Reference [
55], with the restriction that each element in the clique should exceed an experimentally defined frequency threshold
(we experimentally set
). An example of topic
is the following:
investimenti_ online | digital_ marketing |
perspective_ engage | analisi_ delle_ performance |
Note in the example the presence of several “mixed language” concepts (e.g.,
investimenti online,
analisi delle performance where words are either in English or Italian). This is rather common in work environments where the usage of English technical terms dominates. Topics are extracted within temporal windows of length
W (we experimented with different values of
W ranging from weeks to years). Cosine-similarity is used to cluster topics vertically (within the same
W) and horizontally along the temporal line, in order to generate
topic streams . An example of two topics assigned to the same stream is listed below:
TOPIC#:169 | TOPIC#:162 |
user_ experience | mobile_ pos |
news_ pay | pay_ reply |
carte_ di_ credito | user_ experience |
credit_ card | soluzione_ di_ mobile_ pos |
metodo_ di_ pagamento | sistemi_ di_ pagamento |
american_ express | circuiti_ di_ pagamento |
pagamenti_ online | gestione_ coupon |
Since this is not particularly relevant for the scope of the paper (alternative topic extraction methods could be used), we do not compare our algorithm with other topic detection algorithms in literature in detail; however, we mention that topics extracted with different methods have been comparatively evaluated by our project partners in Reply (this was the only possible option for evaluation, since many keywords are obscure for external evaluators), who found the solution proposed here to produce significantly more meaningful topics than, e.g., using LDA. Given a topic stream within a temporal windows W, we are then able to generate the network of users participating in , and to derive all measures described in the previous Section with reference to . This is particularly relevant as far as credibility and collaboration are concerned, since both ranks may depend on specific discussion topics.
Finally, we aim to classify both topics and threads according to three macro-categories, specifically:
technical (T), e.g., mobile applications, big data, responsive design;
organization/administrative (O/A), e.g., project management, potential customers, team building;
leisure (L), e.g., xmas party, blood donation, birthday party.
In order to perform this task, first, we manually annotated about 300 keywords in each of the three categories; next, we created a context vector for each keyword based on their co-occurring keywords in threads; and, finally, we learned a contextual model for each category (i.e., the centroid of member keywords). To classify keywords, we compute the cosine-similarity between their context vector and the category vector, and we assign a keyword to a category if the similarity exceeds a threshold
(see
Section 5.5). Next, based on keywords’ categories, we compute the score of a topic
in each category as follows:
where
is a keyword in
,
is its weight, and
is one of the three categories. A topic
is then assigned to a category
based on:
Note that not only a topic
but also a thread
can be classified in the very same way. Since the objective of the classification is to analyze users’ leadership with reference to the three macro-categories, we only assign a category
(
) to a thread or topic if the normalized score of the
category exceeds of 40% the second classified category. This allows us to analyze only threads and topics which are more “focused”; furthermore, we obtain a high classification performance: we estimated 92% precision on a random sample of 100 topics and 100 threads. Overall, we automatically classified 4264
non-zero threads(see
Section 5.2) and 393 topics.