An Enterprise Social Analytics Dashboard to Support Competence Valorization and Diversity Management

: This paper describes an Enterprise Social Analytics Dashboard (ESAD) to support human capital management, competence valorization, diversity management, and early detection of potential problems within large, networked organizations. The system can be used by managers for career promotion, team building, and diversity management, as well as by company’s social analysts, to monitor social behaviors and information ﬂow in the workplace. Toward this end, we deﬁned a measure of informal leadership which draws on organization theory and on a computational model based on multiplex networks. This model, along with a social network analysis toolkit developed in the context of the present study, enabled the systematic empirical analysis of social behaviors in a three-year dataset of message threads exchanged within a large multinational enterprise, as a function of gender, time, roles, and discussed topics. The results of our empirical analysis demonstrate the power of social analytics in organizations as a tool for human capital management, competence valorization, and early detection of potential problems. Our study clearly shows that Enterprise Social Networks are a favorable environment to highlight women’s leadership qualities and intermediary abilities. The ESAD offers innovative features, such as a sociologically motivated leadership model based on multiplex networks, text mining, and text classiﬁcation techniques, to extract relevant discussion topics.


Introduction
Social media is increasingly implemented in work organizations as tools for communication among employees [1,2]. Enterprise Social Networks (ESNs) are (among others): (i) helping to spread knowledge and share the best practices among employees of multinational companies [3], (ii) helping to build employees' relationship capital [4], (iii) helping to promote employees' creativity and well-being [5], and (iv) helping employees be more productive [6] (recently, negative aspects of ESN have been investigated [7]). As a consequence, a growing number of enterprises, such as General Motors (https://www.gm.com/, accessed on 1 September 2021), IBM (https://www.ibm. com/, accessed on 1 September 2021), Microsoft (https://www.microsoft.com/, accessed on 1 September 2021), and more, encourage their employees to actively participate in their social networking sites. Remarkably, in a recent survey, 90% percent of executives whose companies use social technologies have reported measurable benefits from these tools (http://www.mckinsey.com/business-functions/business-technology/ourinsights/evolution-of-the-networked-enterprise-mckinsey-global-survey-results, accessed on 1 September 2021), such as affecting employee's skills, productivity, knowledge, and motivational level.
While many studies have been concerned with the large-scale analysis of "external" social platforms to inform business and marketing strategies (for example, Reference [8]), very few published results are available on large-scale analytics of enterprise platforms [9], and none have proposed dedicated methodologies to exploit "internal" social data for the purpose of knowledge sharing, human capital management, and competence valorization. For example, available studies [10] confirm that active participation in a company's social networks is indeed positively evaluated for career advancements; however, managers are not adequately supported in their choices by current social platforms, which only provide simple measures such user centrality, number of authored/commented messages, etc. Support to managerial decisions would be valuable also to select a project leader and his/her team based on leadership and competence on the project themes, to foster diversity management, to promptly detect problems, such as lack of communication between groups, unexpected reduction of activity by some influential actor, etc. All these decisions are relevant but remain largely unsupported by state-of-the-art magentaenterprise social analytic dashboards (ESADs).
The present work addresses this gap and provides the following contributions: • We define a computational model to identify network leaders and topic leaders, which draws on leadership models in organization science and on multiplex networks. The model parameters can be tuned to balance in different ways needs and priorities identified by enterprise managers. • We design an magentaESAD that incorporates the model of network leadership and supports the extensive analysis of social behaviors in networked enterprises, to detect differences, commonalities and anomalies in the behaviors of co-workers as a function of gender, role, time, and areas of expertise [11]. The system can be used by managers for career promotion, team building, and diversity management, as well as by company's social analysts to monitor social behaviors and information flow in the workplace. • We consider the case study of a large networked enterprise, Reply (http://www. reply.com/en/, accessed on 1 September 2021), which provided us with a threeyear dataset of multilingual users' communications within their proprietary social network, TamTamy. Our use case (to the best of our knowledge, the largest and lengthiest enterprise dataset used in literature) demonstrates that this type of "internal" social analytics may deliver more timely and actionable insights for human capital management, diversity management, and early detection of potential problems.
The innovative features of the proposed analytic platform is the definition of a sociologically motivated, fine-tunable measure to identify network and topic leaders, which we believe useful to support competence valorization in large networked enterprises. We remark that our notion of leadership is limited to what can be inferred from the large-scale computational analysis of communications and social interactions of an enterprise social network magentaESN, since the actual dynamics, promotion strategies, and managerial decision represent confidential information that enterprises do not commonly disclose: in this respect, our computational approach complements existing results of conventional leadership research.
We further remark that the main contribution of this research is not algorithmic, although algorithms are used to compute some useful measures and to conduct data analytics. Rather, our contribution lies in proposing a leadership model that is based on the insights of organizational science and, moreover, can be adapted (by tuning some of the model's parameters) to meet management's specific needs and perspectives. A second relevant contribution is the insight acquired during the data analytic phase supported by the proposed platform. To the best of our knowledge, this is the largest study (for the temporal dimension and number of analyzed subjects) on social interaction, competence communication, and diversity of leadership skills in a real enterprise network.
The paper is organized as follows: Section 2 surveys the works more closely related with the present study, Section 3 provides an overview of the proposed framework, Section 4 describes the dataset used in this work, Section 5 describes the leadership detection model and supporting algorithms, and Section 6 shortly summarizes the features of our interactive social analytic platform. Finally, Section 7 is dedicated to the analysis of our use-case.

Related Studies
As we already remarked in the introduction, to the best of our knowledge, few studies have been published on the analysis and exploitation of data extracted from enterprise social platforms [11]. A number of ESN platforms are available (see http: //dialoguegroup.com.au/brands.html, accessed on 1 September 2021, for a comparison); however, they provide limited analytical features, among which include, activity streams, intelligent search, and, in some cases, decision workflow approval. The purpose of this paper is to present a social analytic framework for networked enterprises, with two novel and useful features: identification of network leaders, to support career advancements; and anomaly detection, to support early detection of potential problems in the workplace. Though these features are novel in the context of enterprise social analytics for decision support, several studies have been concerned either with leadership analysis in social communities or with anomaly detection in organizations. In what follows, we analyze the main contributions to these applicative fields.
Sociological and management literature has paid a great attention to the problem of characterizing leadership models in organizations. If we restrict to informal leadership, scholars are increasingly recognizing the role of social relations and social processes involved in leading [12]. A social network perspective of leadership, referred to hereafter as network leadership, is concerned more with the relationships connecting individuals than on specific qualities of the individual [13]. As further stated in Reference [14], network leadership is more about influence than control, requiring leaders to create a work environment based on autonomy, empowerment, trust, sharing, and collaboration, where empowerment (https://businessjargons.com/empowerment.html, accessed on 1 September 2021) is defined as the management practice of sharing information, rewards and power with the employees. This quality is considered to be an important indicator of leadership skills in communities [15], in order to foster a greater responsibility of employees through knowledge sharing and participation in decision processes and problem solving.
Network leadership has been commonly analyzed in terms of bonding and bridging [12]: • Bonding ties are the edges in a social network connecting an individual with other actors in the network. • Bridging ties are those connecting otherwise isolated groups or communities within the network.
Network leadership indicators have been related to the notions of centrality and brokerage [12,16,17], respectively, measured with reference to an actor's bonding and bridging capabilities. Scholars hypothesize that highly central leaders have "increased influence over the network due to access to multiple resources and the potential to create new linkages that may enhance social capital" [13], and that leaders with a high brokerage capacity are able to go beyond their "power circle" and play an important coordination role.
We may conclude that, in sociological literature on network leadership, effective leaders are identified as those who are both prominent actors within their entourage and able to perceive the existence not only of their surrounding ties but also of ties connecting other groups. To summarize, network leadership = centrality + brokerage.
In our project, we faced the problem of formalizing these qualities in order to produce a quantitative leadership ranking in online ESNs. To the best of our knowledge, this is the first study providing a quantitative analysis of leadership in online ESNs, as a function of gender, role, and areas of expertise. There are, however, three lines of computer science studies closely related with our work. The bulk of research is concerned with the definition of leadership in general purpose (rather than enterprise) social networks.
The authors in Reference [18] introduce TopLeaders, an algorithm that, first, identifies clusters of connected components using the K-means algorithm. Then, within each cluster, a team leader is identified. A number of other papers (e.g., Reference [19]) identifies leaders with reference to their centrality within communities. More recently, the authors in Reference [20] presented a case study, where top leaders candidates are more likely to be identified among nodes with high centrality belonging to large communities, where teams need leadership to meet communication and coordination challenges.
In Reference [21], the authors propose an algorithm, named Dynamic Opinion Rank, which is based on PageRank [22] and content analysis. Users are classified on the basis of their expertise on the discussion topics and on the comments (positive or negative) of other users. An "influence degree" is then assigned to each message. Influence is propagated between authors using PageRank. In Reference [23], another variant of PageRank is proposed, named LeaderRank. As in the previous work, the notion of leadership is tightly connected with that of competence, but, rather than modeling competence as a function of message content, the authors use bookmarking, which is available in several discussion networks (in the paper, experiments are conducted on Delicious (https://en.wikipedia.org/ wiki/Delicious_(website), accessed on 1 September 2021).
In a similar vein, in Reference [24], the authors present LeadershipRank, a topological and content-based model to identify topic-specific leaders in social media platforms. In the presented methodology, topics and interests are extracted by means of a probabilistic text classifier named Dynamic Text Classifier Neural Network model [25]. The corresponding LeadershipRank is obtained as a combination of users interests with a measure of node centrality. Preliminary experiments have been conducted on a collection of Twitter users and their posts with different measures (i.e., node degree, closeness, betweenness, eigenvector, and PageRank).
All the mentioned works are based on the analysis of public social networks and present models of network leadership which are not grounded on organizational studies. For example, emphasis is given to competence in specific topics; however, competence is only one of the leadership qualities, not necessarily the most important, as outlined in existing studies on enterprise leadership.
With reference to our previous survey of sociological literature, we note that eigenvector centrality measures (such as PageRank), rather than simply counting the number of direct ties of individuals, as for the centrality degree, assign a higher value to connections with other prominent actors. It has been noted in organization studies that this type of "borrowed" centrality avoids the perils of too many ties to maintain [13]. Fewer connections may result in a more effective exertion of influence if they are directed to other central actors.
For the purpose of completeness, we also consider a smaller number of papers concerned with users' behaviors in ESNs, rather than specifically with the notion of leadership [26]. In Reference [9], the authors study the influence on the user graph connectivity of several factors, such as hierarchical relations (co-worker, supervisor, etc.), distance between headquarters of communicating users, and other features, on a 6-month dataset collected from a large ESN. In order to quantify the relations between influencing features and network dynamics they use a logistic regression model. The results show that both users' location and hierarchical differences do influence network statistics, such as the number of influential users, the likelihood of interaction between user pairs, and the graph connectivity.
In the same vein, a number of studies analyze users' behavior in social networks according to gender [27]. In Reference [28], the authors present a model to analyze users' behavior in an online game, an application that may share some common traits with enterprise networks. The network models 6 types of positive or negative relationships (friendship, enmity, etc.) between two types of users (male, female). Experimental results show that women accumulate a significantly higher number of credits in the game, but they exhibit a lesser tendency to risk. Furthermore, they perform a higher number of positive actions, and, as a consequence, they attract more positive actions in return. In Reference [29], the authors explore language differences by gender and role using the emails generated by 158 employees of the Enron Corporation (known as the Enron dataset (https://www.cs.cmu.edu/~enron/, accessed on 1 September 2021)) and show that the manifestations of power differ significantly between gender. Always concerning gender differences, References [30,31] reported that women are almost indistinguishable from men in their network behavior; however, these findings are based on friendship networks. The authors in Reference [32] analyze Tuenti (https://www.tuenti.com/, accessed on 1 September 2021), a Spanish social network service, with a population of about five million males and a comparable number of females. An interesting finding of this study is that females tend to be more homophilous when they have a small or average ego network, as well as when they are in their early stage of participation in the social network. Though there is a general agreement on the higher capacity of women on reciprocating relations and being more collaborative (both by providing content and acting as bridges among communities), mixed results have been obtained concerning the size and density of female networks and their homophily, probably because of the different features and relationship types of the analyzed networks.

System Objectives and Overview
The objective of our work is to define a model of network leadership grounded on organizational studies. In particular, as detailed in Section 5, we consider three qualities of a leader: empowerment, collaboration, and trust. To adapt these qualities to the context of ESNs, we created a correspondence with specific actions that employees can perform in the network, such as authoring, commenting, or rating a post. Furthermore, by considering only threads in specific domains (e.g., technical, organizational, or administrative), the notion of leadership was complemented with domain competence. We compute this leadership model on a multiplex network structure where users are nodes (labelled with role and gender) and directed links in each layer represent one of the three aforementioned types of actions. Furthermore, rather than computing the centrality of nodes in each layer, we also compute the brokerage, in agreement with the studies in References [12,16,17], among others. Enterprise managers or social analysts can explore the network to discover leaders, to study the flow of communications within the network, and to analyze the behaviour of network members.
In Figure 1, we depict a use case diagram of the developed ESAD. As previously discussed, three types of actions are identified: authoring (initiating) a thread, commenting (replying) a post, and rating a post. This information is used to create a multiplex network where nodes are the members of the network, and each layer highlights the activity of authors, commenters and raters.
As already mentioned, the main purpose of the dashboard is to provide effective support for the identification of leaders. While performing this task, managers are supported in the following activities: • Time/Topic selection: in this step, a subset of threads can be generated by defining a time interval and optionally a topic of interest. This enables to perform both a detailed leadership analysis on a time interval and to identify network leaders on specific topics of interest. • Parameters selection and Multiplex network generation: where managers can control the computation of our Model of Network Leadership-see Section 5 for the details about the model, the weighted multiplex network, and the parameters. • Leadership analysis and visualization: in this final step, managers are supported in the analysis of the resulting network leaders and optionally, in order to refine their investigation, to iteratively back on previous steps by changing the subset of threads or the model's parameters. General properties of the network (or sub-network) can be also investigated, such as connectivity, flow of information, and members' activity, depending on gender, role, and discussion topic.
In the subsequent section, Section 4, we provide details on the ESN considered in our study. Next, in Section 5, we describe the Computational model of Network Leadership, while additional details about the dashboard user interface and functionalities are provided in Section 6.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. . User u1 is the author of θ 1 , and u3 authored θ 2 and θ 3 . In θ 1 , u2 and u3 replied to u1, and u1 replied to u3; the user u2 liked the post of u1, and u1 liked the reply of user u3. In θ 2 , u1 and u2 replied to u3, and user u2 liked the reply of u1; and, finally, in θ 3 , u2 replied to u1, and the user u3 liked the reply of u2 and posted a reply to u2. The corresponding multiplex network (see box at the bottom) is composed by three layers, i.e., G 1 , G 2 , and G 3 , where nodes are users and weighted directed edges represent the activities of authors, commenters, and raters, respectively.

Use Case Description: The TamTamy Enterprise Social Network
The research described in this paper is part of a financed research project which leads to the implementation of a Social Analytic layer, named Fiordaliso. We apply our methodology to the use case of a proprietary ESN, TamTamy (http://www.reply.eu/tamtamy-reply/en/, accessed on 1 September 2021).
TamTamy has been designed by Reply (http://www.reply.com/en/, accessed on 1 September 2021), a large international network of specialized companies in the field of digital services, based in the United Kingdom, Italy, Germany, France, Belgium, Poland, Holland, and now expanding worldwide (USA, Brazil). Given its cluster-based organization, Reply openly declares to be focused on online sharing tools, such as TamTamy, since "communities are the key to create engagement, brand awareness and knowledge sharing". Furthermore, interviewed managers reported that being active online is acknowledged to be a push for future promotions. Therefore, TamTamy seems to be an ideal environment for the objectives of the present study.
Though the type of data that can be derived from a social network obviously depends on the specific platform, many of the features of TamTamy are indeed common to other popular enterprise social networks, such as, e.g., Jive (https://www.jivesoftware.com/ products-solutions/jive-n/, accessed on 1 September 2021). Basically, these environments provide a collaboration space to share knowledge and documents, to communicate with co-workers, and to search and find relevant content and people within a company.
Though the TamTamy platform provides a number of services, we are focusing here mainly with social networking and communication. Messages are started by a thread initiator (the author). The other users (commenters) are then free to contribute and to rate messages with likes and dislikes, akin to Facebook (https://www.facebook.com/, accessed on 1 September 2021) style. The basic format of each thread is the following: thread_id, thread_title, thread_author, timestamp, thread_tags, author_title, author_gender(0=male,1=female), thread_content, {comment_id, commenter, commenter_title, gender, timestamp, comment_content,} n The thread tags are free, although there exist a number of predefined tags. The field Title indicates the author/commenter role in the company (manager, consultant, senior consultant, partner, external, content publisher). Partners and managers are executive roles. A thread excerpt (anonymized for privacy reasons) is shown in Figure 2. Note that only rarely is the recipient explicitly indicated in the message; otherwise, answers are meant to be directed to the author of the thread, as on Facebook. In our dataset, threads are in English, German, and Italian, and, often, the text includes a mix-up of languages since many technical terms are in English, even when the conversation is in another language. Moreover, the use of mixed terms (e.g., pagamento online) is quite common. The phenomenon of "mixed linguality" (i.e., terminological expressions including words in multiple languages) was one of the main difficulties when analyzing message content (see Section 5.4), since available natural language processing (NLP) tools, such as part of speech tagging, cannot be readily used.
In our study, we used a three-year anonymized dataset of employees' messages, which has been made available for analysis in the context of the Fiordaliso project.

A Computational Model of Network Leadership
In the section, we describe the methods and algorithms used to detect Network Leaders in an enterprise social network (ESN).
In accordance with the sociological and organization science literature surveyed in Section 2, we decided to model network leaders as those individuals showing a high degree of both centrality and brokerage. The Network Leadership Rank is defined as follows: where LCR(u) and LBR(u) are the leadership centrality and brokerage ranks of an employee u. The coefficients α 1 and α 2 balance these two qualities. We here set α 1 = α 2 = 0.5; however, these coefficients, as well as any other parameter in our model, should be set in agreement with managerial strategies of a company, as a complementary support for decisions on career advancements. We summarize all the parameters in Section 5.5.
In the next sections, we present our model of leadership centrality and brokerage.

Centrality Measures
As remarked by organization scientists, centrality is better modeled by measures in the PageRank [22] flavor, referred to also as "borrowed glory" centrality in Reference [13]. In computer science literature, several works [33][34][35][36][37] highlighted the superiority of so-called eigenvector centrality measures (such as PageRank) with respect to other simpler centrality measures (e.g., closeness, betweenness, centrality degree) in the task of identifying the important nodes in a graph.
Among these works, Reference [38] showed that PageRank has a high correlation with authors' citation rank in a co-citation network, while centrality measures exhibit a much lower correlation; References [39,40] applied several centrality measures to the task of identifying "important" individuals in the Enron dataset. They assume as ground truth the role of each individual (i.e., managers versus employees) (note that this seems questionable: as we hypothesize and later demonstrate, the informal environment of a social network may sharpen the leadership qualities of employees regardless of their current role). Their conclusion is that [40]: "On their own, none of these metrics captured our intuitive understanding of social network importance. A socially important individual is one who is both connected to a large number of individuals and who has significant correspondences with those individuals." This is precisely the approach undertaken in this paper. Finally, References [41,42] showed the better performance of PageRank and compared it with a variety of alternative centrality measures on several benchmark networks, among which one was a Facebook dataset [42].
A number of PageRank variants have been also applied to multiplex networks (e.g., References [43][44][45]). For example, Reference [46] extended several centrality measures used for monoplex networks to the case of multilayer networks and highlighted the superior performance of multilayer PageRank in discovering the central nodes in these types of networks.

A Multiplex Model of Leadership Centrality
In the previous sections, we justified our choice of preferring an eigenvector model in the style of PageRank to measure the centrality of network users. However, we are still faced with the problem of designing an appropriate model of users' interactions in the network; in other words, we need to formalize the notion of "bonding" in an ESN, such as TamTamy. Once more, we rely also on sociological literature to identify social behaviors that may indicate leadership.
We illustrate our bonding model with the help of Figure 3. As explained in previous Section 4, co-workers play three roles: author, commenter, or rater. In agreement with Reference [15], the action of starting (authoring) a thread can be considered as an empowerment action, since authors are willing to share knowledge and solicit the collaboration of colleagues. As also remarked in Reference [47], "empowerment is to start a conversation". In organizational conversations, when talking with employees rather than to them, the achieved interactivity makes the conversation open rather than directive [47] and enables participants to put their own ideas.
Note that, in TamTamy, starting a thread is not a role obligation, except for content publishers who are a small minority. Furthermore, messages whose purpose is simply to broadcast information, rather than soliciting contributions, do not receive comments. Restricting to threads with at least one comment (referred to hereafter as non-zero threads) provides a reasonable support to the hypothesis that the author's motivation was indeed to involve other partners on specific aspects of their working life. Furthermore, as demonstrated in Section 5.4, on topic analysis, the large majority of non-zero threads actually includes discussions on a variety of technical and organizational issues.
The interpretation in terms of leadership qualities of the other roles in Figure 3 is more straightforward. Commenters show their willingness and ability to collaborate. In fact, collaborative leaders [48] do not exert their role just deciding what to do: rather, they help and encourage the collaborative process work. Finally, raters express their trust (or distrust), thus acknowledging/questioning the competence of commenters and the authority of authors. As remarked in Reference [49], trustworthy leaders are "rewarded by employees who stretch, push their limits, and volunteer to go above and beyond". To represent the mutual reinforcement among an actor's empowerment, collaboration, and trust centrality, we use a three-way multiplex network. Multiplex networks [50] are a class of networks recently introduced to model systems in which the same set of nodes are connected via more than one type of link. They are a special case of multilayer networks, in which no links are allowed among different layers.
In our multiplex network, the first layer, G 1 (N, E 1 ), represents the activity of authors; the second layer, G 2 (N, E 2 ), models the activity of commenters; and the third layer, G 3 (N, E 3 ), is that of raters (see the small artificial example of previous Figure 1, depicting a set of three threads and the corresponding multiplex network's layers).
In the first layer, G 1 (N, E 1 ), for any thread θ initiated by any author i, we add an edge e 1 (i, j) with w θ (i → j) = 1, whenever j posts a comment in θ. This models the fact that author i has empowered user j. The cumulative empowering weight of an edge e 1 (i, j) is computed as: The adjacency matrix E (empowerment matrix) associated with G 1 is left stochastic, with each column summing to 1.
Edges e 2 (i, j) in G 2 (N, E 2 ) are created whenever a user i posts a comment for the first time in a thread θ authored by j. Subsequent answers update the weight of e 2 (i, j). Given the sequential thread structure, comments are considered as implicitly directed to the author, unless the message includes the name of another recipient in the thread (as in the example of Figure 2). Note that authors themselves can add comments during a discussion, and, in this case, they are treated as commenters. Any comment h in a thread θ has a weight computed as follows: where k is the sequential order of the comment in the thread (k = 1 if a user is the first who replied). This models the idea that the first commenter is more competent, or more willing, to collaborate, than others (alternatively, k can be set to represent the difference between the timestamp of the author's message and that of the commenter). In the formula, π 1 , π 2 are parameters that we experimentally set to 0.5. The cumulative collaboration weight of an edge e 2 (i, j) sums all answers provided by user i to user j in any thread within a considered timespan W: The collaboration matrix C (the adjacency matrix associated with G 2 ) is, as for E, left stochastic.
The third layer G 3 (N, E 3 ) models the rating activity of users. Edges e 3 (i, j) are weighted with the trust w t (i → j) of rater i in user j. We first introduce the quantity: where m(j) is the set of messages generated by user j (either as an author or as a commenter), h is a message in m(j), and δ h (i → j) = +1 if i likes h, −1 if i dislikes, and 0 if no opinion is expressed. As for Formula (9), parameters π 3 and π 4 are set to 0.5. According to Equation (7), 0 ≤ trust < 0.5 indicates distrust. We then define: The T matrix (the adjacency matrix associated with G 3 ), named the credibility matrix, is right stochastic.
We further denote with M the 3-way N × N × 3 tensor of the multiplex network. The third dimension of the network represents the network leadership qualities discussed at the beginning of the section: empowerment, collaboration, and credibility. We now need to measure the centrality of nodes in the multiplex network, to identify highly central agents.
A simple assumption would be to compute, for every user, the empowerment, collaboration, and credibility ranks r e , r c , and r t using monoplex PageRank [22], and then computing some heuristic function to combine these indicators, for example, using a regression, as in Reference [9]. However, a better assumption is to postulate a mutual reinforcement relation among layers, i.e., that the centrality of each node in one layer affects the centrality of the same node in any other layer. Therefore, to compute our measure of network leadership centrality, we use Multiple PageRank (MPR), introduced in Reference [43].
First, we note that PageRank centrality depends on the in-degree of nodes, while, according to our formulation, empowerment and collaboration of a node depend on the weight of outgoing edges. Therefore, we need to invert the direction of edges in the corresponding graphs. We denote with E , C (Where denotes the transpose), and T the slices of an M tensor; and, with w e (j → i) = w e (i → j), w c (j → i) = w c (i → j), and w t (i → j), the corresponding matrix cells.
Our formulation of MPR follows the interaction model of Figure 3. In the credibility layer, we have: which corresponds to the standard monoplex PageRank formulation with teleporting, in fact, as shown in Figure 3, the activity of raters is not influenced by the other layers. Note that this is not true, in general, since a rater might be influenced by the role of the rated node (an employee might be more prone to place a "like" on his/her boss thread or comment); however, we here assume objectivity of raters. In each of the other two empowerment and collaboration layers, in analogy with Reference [43] (in Halu et al.'s formulation, any layer k is impacted only by the previous k − 1 layer), we include the multiplicative and additive effect of the other two layers, as follows: where the symbol . . . indicates the average operator, and the exponents β, γ ≥ 0 are set to tune the influence of one layer on the others. Equations (8) and (9) can be interpreted as follows: the first term of the equations shows that the rank of a node i is determined by the discounted rank of its neighbors in the same layer (as in the original PageRank formulation) multiplied by its average rank in the other two layers k − 1, k + 1, powered by a factor β. The second term reflects the contribution to node i's rank deriving from its average importance in layers k − 1, k + 1, powered by a factor γ. By adding the second term, nodes that are not able to attract important nodes in one layer can still gain importance by virtue of their centrality in other layers. The second term represents the multiplex formulation of teleporting, where the teleporting factor α k is layer-dependent. It is shown that, since in a multiplex network layers are not connected (contrary to the more general category of multilayer networks), teleporting does not allow a random walker to jump from one node of a layer to another node of another layer, though the probability of the destination node is influenced by its rank in the other layers.
The linear system of Equations (7)-(9) can be computed using the stationary iterative method: where r t is the rank vector in iteration t, and A is the matrix of coefficients of the linear system. We note that the three matrices E , C , and T are stochastic, which is a necessary, but not sufficient, condition for convergence. However, for monoplex PageRank, teleporting is used to force primitivity of the original stochastic matrix, which ensures convergence of the iterative method according to Perron Frobenius theorem (https://www2.warwick.ac.uk/ fac/sci/maths/people/staff/oleg_zaboronski/fm/pf_theory.pdf, accessed on 1 September 2021), thus deriving the three stationary rank vectors r k . To efficiently calculate stationary values, an iterative "divide et impera" strategy is adopted (details are omitted for brevity, and interested readers should refer to References [43,51]), in line with References [43,51]. Efficiency is crucial since, even if the size of an enterprise network is by far smaller than that of a world-wide network, we are interested in generating a real-time ranking of users according to variable parameters, as mentioned later in Section 6.
Stationary values for Equations (7)-(9) are combined in a balanced way to compute the leadership centrality rank ((LCR(u)); see Equation (1)) for all co-workers.

Measuring Brokerage of Network Leaders
Network leaders, as we said, should also be good intermediaries [13]. We use the leadership brokerage rank (LBR(u)) to highlight intermediary roles of leaders. Brokers, or key-players, or bridges, are actors whose main role is to connect communities, thus acting as bridges. To compute brokerage rank, we use the KPP-NEG algorithm described in Reference [52], which is based on rating nodes on the basis of the amount of network fragmentation induced by their removal. Note that brokerage, contrary to PageRank centrality, is a property of individual actors and is not affected by the rank of connected individuals. For this reason, the measure of brokerage is not incorporated into the multiplex network model but is, rather, considered as a separate and complementary quality, as shown in Equation (1).

Topic Leaders
The previously described leadership measures can be applied to an enterprise network in general; however, it may be of interest to assess leadership with reference to specific topics. A user may be highly credible when he/she discusses about, e.g., mobile apps, and be much less confident on business models. Consequently, network leadership should be analyzed also in the context of users' competence. To this end, our aim is to extract topic networks, i.e., networks of users focused on specific topics.
In TamTamy, the content of threads, besides their multilinguality and mixed-linguality, greatly differ also in the type of discussed topics. A few threads are on leisure topics (for example, the organization of a football match); however, the majority is on technical or administrative topics. We perform topic extraction in two steps: first, by learning relevant terminology from threads; and, second, by generating clusters of co-occurring terms. Finally, we also classify topics in more general categories.
A common topic learning approach in literature is to use stemmed words as items, and then to cluster items using a latent topic model (such as Latent Dirichlet Allocation (LDA) [53] or one of its many variants). This solution turned out to perform poorly due both to mixed linguality and to the reduced dimension of messages. To extract more meaningful terms (with reference to the company's competences and topics), first, we index only "content" tokens, or concepts, identified as those words mapping with BabelNet [54], a freely available semantic network covering more than 50 languages and more than 13 million concepts. In this way, the specific language in which a concept is expressed does not matter. Then, we extract concept n-grams that are either consecutive (i.e., compounds) or separated by prepositions and determiners. For example, in the sentence: "This technology has been around for over three years and has been used in Macy's for marketing purposes for months now", only the bold tokens are indexed.
Finally, we extract concept cliques using the Bron-Kerbosch clique detection algorithm, as described in Reference [55], with the restriction that each element in the clique should exceed an experimentally defined frequency threshold φ (we experimentally set φ = 3). An example of topic τ is the following: investimenti_ online digital_ marketing perspective_ engage analisi_ delle_ performance Note in the example the presence of several "mixed language" concepts (e.g., investimenti online, analisi delle performance where words are either in English or Italian). This is rather common in work environments where the usage of English technical terms dominates. Topics are extracted within temporal windows of length W (we experimented with different values of W ranging from weeks to years). Cosine-similarity is used to cluster topics vertically (within the same W) and horizontally along the temporal line, in order to generate topic streams s(τ). An example of two topics assigned to the same stream is listed below: TOPIC#:169 TOPIC#:162 user_ experience mobile_ pos news_ pay pay_ reply carte_ di_ credito user_ experience credit_ card soluzione_ di_ mobile_ pos metodo_ di_ pagamento sistemi_ di_ pagamento american_ express circuiti_ di_ pagamento pagamenti_ online gestione_ coupon Since this is not particularly relevant for the scope of the paper (alternative topic extraction methods could be used), we do not compare our algorithm with other topic detection algorithms in literature in detail; however, we mention that topics extracted with different methods have been comparatively evaluated by our project partners in Reply (this was the only possible option for evaluation, since many keywords are obscure for external evaluators), who found the solution proposed here to produce significantly more meaningful topics than, e.g., using LDA. Given a topic stream s(τ) within a temporal windows W, we are then able to generate the network of users participating in τ, and to derive all measures described in the previous Section with reference to τ. This is particularly relevant as far as credibility and collaboration are concerned, since both ranks may depend on specific discussion topics.
Finally, we aim to classify both topics and threads according to three macro-categories, specifically: 1.
In order to perform this task, first, we manually annotated about 300 keywords in each of the three categories; next, we created a context vector for each keyword based on their co-occurring keywords in threads; and, finally, we learned a contextual model for each category (i.e., the centroid of member keywords). To classify keywords, we compute the cosine-similarity between their context vector and the category vector, and we assign a keyword to a category if the similarity exceeds a threshold Θ (see Section 5.5). Next, based on keywords' categories, we compute the score of a topic τ in each category as follows: where k j is a keyword in τ, w j is its weight, and C i is one of the three categories. A topic τ is then assigned to a category C i based on: Note that not only a topic τ but also a thread θ can be classified in the very same way. Since the objective of the classification is to analyze users' leadership with reference to the three macro-categories, we only assign a category C i (i = T, L, O/A) to a thread or topic if the normalized score of the argmax category exceeds of 40% the second classified category. This allows us to analyze only threads and topics which are more "focused"; furthermore, we obtain a high classification performance: we estimated 92% precision on a random sample of 100 topics and 100 threads. Overall, we automatically classified 4264 non-zero threads(see Section 5.2) and 393 topics.

Summary of the Parameters
In the section, we summarize and discuss the parameters of our computational model of Network Leadership. In Table 1, for each parameter, we provide a brief description and the default values we also experimented on in the use-case study later discussed in Section 7. As already remarked in Section 5, the computational models of network leadership parameters should be set in agreement with managerial strategies of a company, as a complementary support for decisions on career advancements. For example, thanks to α 1 and to α 2 , managers can find their better trade-off between the leadership centrality and the brokerage ranks. Similarly, it is possible to control the importance given to commenters (π 1 , π 2 ) and raters (π 3 ,π 4 ). With the exceptions of α t , α e , and α c parameters, which have been originally set to 0.85 by the authors in Reference [22], β and γ parameters enable an even, fine-grained control of the model by calibrating the mutual influence between the three (i.e., trust, collaboration, and empowerment) layers. Indeed, the capability to adapt our model to the strategic needs of managers is one of the relevant features of our proposed framework. Table 1. Summary of the parameters of our computational model of Network Leadership. We remark that φ and Θ values were tuned to optimally perform Topic Leaders analysis with our dataset.

Parameter
Description Default or Tuned Value Finally, we empirically tuned the φ and Θ parameters (see Section 5.4) and evaluated that φ = 3 and θ = 0.80 are optimal thresholds for topic leaders analysis with our dataset.

The Enterprise Social Analytics Dashboard
To support social analytics, we implemented an ESAD which provides a large number of real-time statistics, trend graphs, topic graphs and stream graphs, users' and global rankings, and more. Furthermore, sub-networks can be derived according to several parameters, such as timespan, gender, roles, discussed topics, and single keywords. As we previously remarked, the dashboard is designed for both descriptive and prescriptive analytics, to support managers in the complex task of deciding career advancements (we remark that, in Reply, complementing decisions on career advancements with information on employees' social behavior in the TamTamy network is an established practice, as it probably happens in several networked companies), and to identify and diagnose abnormal social behaviors, either positive or negative. Figure 4 shows the visualization of the social graph when selecting a number of parameters, as in the upper part of the Figure. Females are represented by pink squares, and males by blue circles. The dimension of nodes reflects the network leadership of the users, and thickness of edges represents the intensity of communications between two users.  Figure 5 represents a topic network, where the topic is big data. Topics can be selected either from the set of the top 10 topics shown in the window Topics, or with a free query. Note that detecting topic networks and leaders is particularly helpful, for example, in real-life settings in which a manager must select the project leader and team of some specific project. In large, networked companies, competences are distributed and often not formally cataloged; therefore, competence identification is not obvious.   Other more traditional functions are available, such as general statistics, temporal series, and ego networks. We remark that the Fiordaliso dashboard has been recently integrated with the TamTamy platform.

Use-Case Analysis
A quantitative assessment of the performance of our network analytics models is not possible, since companies do not disclose their promotion decisions, nor the motivations behind a detected anomaly. Therefore, we dedicate the section to demonstrating the utility of our ESAD for analyzing a number of issues which are known to be relevant in organizations, and which are not supported by state of the art social analytic platforms. As we already mentioned, there are two types of users for the ESAD presented in this paper: • Mangers: Managers can use the platform to support career advancements, by inspecting the network leadership of selected candidates, or by identifying possible candidates not previously considered (as in Figure 4); they can use the platform to form a project team, by generating topic networks (as in Figure 5); they can analyze anomalies in user's and group social behaviors and decide whether to take appropriate actions. • Social Analysts: Large companies today often include or engage specialized teams for social analytics: this is the case, for example, in Reply. Social analysts can be supported by our platform in deriving general statistics on the behavior of coworkers. Useful questions that can be answered through our platform are: What have been the main topics of discussion in a given timespan (as in Figure 6)? Are there significant differences in the social behavior of coworkers, according to time, role, and gender? Does the social network encourage the sharing of knowledge (e.g., does the dimension of topic networks-on selected topics considered central-increases or decreases? Is there a tendency to create closed circles?), etc. Examples of the latter two types of analytics are provided in the section.
We experimented with our platform and algorithms on a 3-year dump (2012-2014) of anonymized and partly amended (i.e., without attachments) TamTamy threads, mostly in English, Italian, and German threads. Overall, the TamTamy dataset includes over 50,000 threads from around 2000 different users. To the best of our knowledge, this is the largest and lengthiest enterprise dataset considered in literature: for example, the dataset used in Reference [9] spans over only 6 months, while the popular Enron, used in many studies, includes mails from only 158 users, mostly senior management of Enron. Other studies are based on survey data of some dozens of employees. Table 2 presents summary data. The table shows aggregate statistics for the three years, and additional data by gender and role. We show the percentages only for executives and for females, since the percentages for employees and for males can be, respectively, computed by complementing the shown statistics. Table 2 also shows the network leadership measures presented in Section 5, leadership centrality rank (LCR, Section 5.2) and leadership brokerage rank (LBR, Section 5.3), the dashboard also supports the computation of more traditional measures, such as centrality degree. As we already remarked in Section 5 and summarized in Section 5.5, the setting of parameters in our leadership model depends on managerial choices: for example, centrality can be considered more important than brokerage, competence on specific topics might be required, and the mutual influence between collaboration, empowerment, and trust can be adjusted. In our use case, to compute LCR, parameters β and γ in Equations (8) and (9) are all set to 1, while the teleporting factors α are set to 0.85, which is a commonly used value. Furthermore, as stated in Section 5.2, leadership measures are based only on non-zero threads.
The last three lines in Table 2 show the % of females among top LBR, top LCR, and those who are both top LBR and top LCR, i.e., the Network Leaders (NL), according to what we stated in Section 2. Top leaders have been empirically selected as those with an LBR or LCR value v ≥ avg(v) + 2 × SD(v) during the analyzed period. These individuals represent more or less 6-7% of the population, which seems a reasonably selective threshold. Hereafter, we analyze more in detail our data, with the aim of answering potentially interesting questions for a company's social analyst (we remark that empirical choices and usage examples described in this paper have been discussed with our industrial partner during the Fiordaliso project). We begin by testing the hypothesis that social networks are congenial to female leadership, as argued, e.g., in References [14,15]. In this respect, the analysis of Table 2 provides some evidence and suggests the following comments: 1.
Females are more authors than commenters: While the number of active females stays more or less the same during the three years (20-23%), women progressively became more proactive (female thread authors increase from 15% to 25%); furthermore, executive (managers or partners) women became significantly (statistical significance (p ≤ 0.01) against the null hypothesis was assessed using the z-score test https: //en.wikipedia.org/wiki/Z-test, accessed on 1 September 2021) more active. In our case, the null hypothesis is when the measured statistics of a conditional event is marginally different from the prior probability of the unconditioned event. For example, here, we compare the observed percentage of active female executives with the prior probability of female executives.) more active than their male counterpart, since active female executives have been 41% of active executives in 2014, even though female executives were 27%. These data show that, as the use of TamTamy as a collaboration tool was progressively encouraged by the company, women increased their empowerment ability. Instead, the percentage of female commenters shows some variability, with a peak in 2013.

2.
Females excel as network brokers: Table 2 shows that women are systematically among the top brokers. The percentage of women in top brokers is significantly (see previous footnote) higher than the percentage of active female users. Women excel in the ability to share information, and their intermediary role is highlighted in the ESN.

3.
Females are no different from males as central leaders: The table also shows that women are among top central leaders proportionally to their presence (though we noted that the majority of these top leaders are placed in the highest positions of the ranking), but this percentage drops to 7% in 2014. However, as also shown in the last column of Table 2, during the first part of 2014, the percentage of women in top leaders was more or less the same as in previous years. The second semester of 2014 is then to be considered an anomaly, which was caused, as we could diagnose inspecting users' activity time series, by the sudden and total disappearance from the network of four women with top LCR. Unfortunately, the motivations for this drop-off could not be disclosed by Reply. However, we note that the ability to detect anomalies is a key feature of the computational framework presented in this paper. We further note that the drop in female LCR does not correspond to a drop in female brokerage since, as explained in Section 5, the LCR of a node influences those of the other connected nodes, while the same does not apply to LBR. The loss of a woman with high LCR, thus, has a much more negative effect on the other connected nodes than the loss of a woman with high LBR, whose effect is rather on the global network connectivity.

4.
Female top brokers are often also top central leaders: The most remarkable result is that the % of female Network Leaders (NL) is significantly higher than the prior probability of female co-workers, except the second "anomalous" semester of 2014, in which we already noted a drop in LCR. When considering the aggregated result along the three years, however, there were 47% women in the intersection between top LBR and top LCR co-workers. This is motivated by the fact that there is a high overlapping between top female central leaders and brokers, while the same is not true for males, who are either central leaders (slightly more than females) or brokers (significantly less than females), but rarely both.
Taken together, our evidence suggests that women are well aware that knowledge sharing and networking have become increasingly important activities in today's organizations [56], and, despite their being a minority, they play a primary role especially in engaging and connecting with their colleagues. Considering the collaborative and knowledge-sharing purpose of the TamTamy network, the higher intermediary (brokerage) ability of women in such ESN reveals their distinctive inclination to participative and collaborative behaviors. These findings suggest that the more "feminine" setting created by an ESN leads to the emergence of women as leaders, compared to more "masculine"-oriented environments.

Knowledge Sharing: Are There Role/Gender Barriers?
An important aspect of co-workers interaction is homophily, defined as the tendency of individuals to associate and link with similar others. Several studies on the effect of homophily in interpersonal relations (e.g., References [30,57]) confirmed that homophily can also become a barrier to effective knowledge sharing, when strongly homophilyoriented individuals are not exposed to external stimuli and new experiences. In fact, while facilitating coordinated action by managers, homophily may adversely restrict decisionmaking options [13]. In the section, we investigate homophily by role and gender.
First, we introduce a model of homophily. Since, in a thread-based model, authors post a message without specifying the addressee, only the action of commenting a message may, or may not, be influenced by the gender of the author. Therefore, denoting with g (= male, f emale) the gender to which a user x belongs, we model the g-homophily as the conditioned probability that a g commenter answers a g author: P(g-commenter|g-author) = P(g-commenter ∪ g-author) P(g-author) .
Accordingly, we state our homophily hypotheses as follows: Hypothesis 1. (neutrality): There is no statistically significant difference between the probability that a g-commenter answers a g-author and the prior probability of g-commenters.
Hypothesis 2. (homophily): The probability that a g-commenter answers a g-author is significantly higher than the prior probability of g-commenters.
To test our hypotheses we divide the three-year dataset into 3-month slots, and, in each slot, we estimate P(g-commenter ∪ g-author) as the number of messages posted by a g-commenter in answer to a g-author (purging additional comments posted by the author itself), while P(g-author) and P(g-commenter) are estimated by the number of such messages addressed to a g-authors or originated by a g-commenter, respectively. Next, we apply the Wilcoxon signed-rank test (https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test, accessed on 1 September 2021), which is a paired difference test. Since the Wilcoxon test implies counting the number of times the difference between pairs is positive or negative, without considering the amount of the difference, we also compute a z-test over the full period to rank significant results (according to the Wilcoxon) as highly or moderately significant. Figure 7 plots prior and conditional probabilities and shows that both males and females are homophilous (according to Wilcoxon test); however, females are considerably more homophilous than males according to the z-test: in fact, for females, Hypothesis 2 is accepted with p = 7.12 × 10 −8 , while, for males, we obtain p = 0.02111 (hence, p ≥ 0.01). These results are in contrast with References [30,31], who did not detect substantial women homophily; however, their experiments were conducted on friendship networks, representing a less formal environment than an ESN. A homophily result similar to ours was instead obtained in Reference [28] with reference to an online combat game: this type of social network is more similar to an ESN than to a friendship network, since relationships are established according to confidence and competence, rather than socialization needs.
To assess the influence of roles, commenters and authors were partitioned in 8 categories, depending upon their gender g and role r (=executive, employees). In this experiment, we tested all possible combinations of commenter and author. Each individual may belong to any of 8 categories: female, male, executive, employee, female-executive, femaleemployee, male-executive, and male-employee, for a total of 8 × 8 significance tests. For example, we test if female executives have a strong tendency to comment threads authored by a female, or by an executive, or by a female executive. As before, the case for neutrality (see Hypothesis 1) is one in which the estimated conditional probability is not statistically different from the prior probability of a C-commenter, where C is one of the 8 categories. We here summarize only the relevant outcomes.
In agreement with Reference [9], we noted that roles do influence social relations, but this is definitively more true for males. The main result emerging from this experiment is that males are influenced primarily by roles and then by gender: in fact, males executives mostly answer to male executives (Hypothesis 1 is accepted with p = 1.555 × 10 −9 ) and male employees mostly answer to male employees (p = 0.0004347), while they are antihomophilous towards males with a different role and anti-heterophilous towards females, in general. Women instead are influenced primarily by gender since they show neutrality (see Hypothesis 1) towards females in different roles. Furthermore, female executives are more "democratic" than male executives, since they show neutrality towards employees (p = 0.381), while male executives are strongly anti-heterophilous towards employees (p = 4.006 × 10 −8 ).

Competence Analysis: Are There Detectable Gender Differences?
Finally, we analyzed gendered network behavior as a function of discussed topics. Our results show that women are active also in networks dealing with very technical topics. For example, the previous Figure 5 shows that the social network of users discussing about big data is led by a woman; furthermore, there are three women among the top five LCR.
A more systematic analysis of competences as a function of topic and thread categories is summarized in Table 3. Topics and threads have been classified as described in Section 5.4. We remind that we considered only a subset of 393 topics and 4261 threads showing a strong membership to a single category. The first two rows of the table show, as expected, that the ESN is used mainly to discuss about technical and organization/administrative issues, while leisure topics are a minority. The subsequent rows show the percentage of women among the first top leaders and brokers in the two networks originated by the subset of classified topics and threads. The percentage of women participation in these two networks was 20.7% and 20.1%, respectively. These values represent the prior probability against which values in rows 3-6 of Table 3 should be compared.
When restricting to focused topics, women leadership and intermediary skills are significantly greater than what would be expected given their minority presence in the network. These results are stronger than in Table 2, showing that, as pointed out in several recent articles (e.g., http://www.theatlantic.com/magazine/archive/2014/05/theconfidence-gap/359815/, accessed on 1 September 2021), women are more competent than confident. Table 3 also clearly shows that women are equally active in technical and organization topics. This is very interesting since it contradicts the stereotype that women are less competent than men in technical subjects. In conclusion, our case study, supported by the implemented platform and measures, has led to a number of interesting results concerning the common behaviors of co-workers:

1.
Women are significantly more gender homophilous than males, while males are homophilous by role in the first place, and then by gender.

2.
The network leadership analysis has clearly shown that women co-workers are top central leaders proportional to their male colleagues, but they are top brokers significantly more than what would be expected given their presence in the network. We also found that female central leaders are also central brokers more often than males; thus, females embody the notion of network leader better than males. Furthermore, female leaders, contrary to what is observed for the total female population, are gender-neutral, as it should be: a good leader is willing to collaborate and engage their colleagues regardless of gender.

3.
Concerning competence, our topic-oriented classification of users' conversations has shown that women are equally competent (and leaders) in technical, administrative, and organizational issues, thus contradicting the stereotype of their lower expertise in technical subjects.

Conclusions and Summary of Results
In this paper, we present a model of network leadership in ESNs grounded on organizational studies. The model takes into consideration empowerment, collaboration, and trust as the main qualities of a leader, as well as creates a correspondence between these qualities and the users' actions observable in an ESN, such as authoring, commenting, or rating a post. Furthermore, these actions can be restricted to posts concerning specific topics, thus also introducing a notion of domain competence.
The ESN has been modeled as a multiplex network with three layers, each focusing on a specific type of interaction (authoring, commenting, or rating). The leadership quality of network members can be assessed by computing their eigencentrality and brokerage on each layer and then computing a cumulative leadership value that can be fine-tuned by managers and network analysts according to their specific needs and vision.
Furthermore, we designed an ESAD that, in addition to identifying network leaders, supports the analysis of the behavior of co-workers as a function of gender, role, and competence domains, with interesting findings. In a large use-case study on an existing ESN (i.e., TamTamy), we have clearly shown the power of social analytics in organizations as a tool for human capital management, skills enhancement, and early detection of potential problems.

Conflicts of Interest:
The authors declare no conflict of interest.