Network Analysis Based on Important Node Selection and Community Detection

: The stability and robustness of a complex network can be signiﬁcantly improved by determining important nodes and by analyzing their tendency to group into clusters. Several centrality measures for evaluating the importance of a node in a complex network exist in the literature, each one focusing on a different perspective. Community detection algorithms can be used to determine clusters of nodes based on the network structure. This paper shows by empirical means that node importance can be evaluated by a dual perspective—by combining the traditional centrality measures regarding the whole network as one unit, and by analyzing the node clusters yielded by community detection. Not only do these approaches offer overlapping results but also complementary information regarding the top important nodes. To conﬁrm this mechanism, we performed experiments for synthetic and real-world networks and the results indicate the interesting relation between important nodes on community and network level.


Introduction
The highly dense literature and studies on complex networks [1][2][3][4] demonstrate the importance of this field, which can be explained by its versatile usage, ability to model a wide variety of real systems and applicability in a series of domains such as travel, commerce, biology, technology, sociology, chemistry, economics and many other fields. Regarding a whole network, its dynamism and properties are encoded in its nodes and their wiring diagram, i.e., the links. There are a series of metrics aimed to measure these basic elements and the nodes, according to different criteria. Looking at a higher level, these nodes will always form some kind of groups and clusters whose properties are just as significant as the nodes, regarding the nature of the network.
One important research direction related to network science refers to the detection of important nodes. Several centrality metrics that can be engaged for this task are defined in the literature [3] while many studies continue to propose new measures and models to attempt better approaches [5][6][7][8]. The identification of important nodes in networks is a challenging but essential task with many high-impact applications in the real world. For example, in case of a virus, it can help identify the key sources of spreading to prevent a possible epidemic. In a large-scale network, it is compelling to back up the servers in order not to lose important data. Identifying the important nodes will increase the endurance and robustness of the network. Another example can be extracted from the series of large blackouts that happened in different countries over the last decades. A significant one is the power blackout that affected most of northern and eastern India on 30 and 31 July 2012. The one from 30th July affected over 400 million people, being the largest power outage considering all the people affected, surpassing the blackout from January 2001 in Northern India-230 million people were affected then. Such high-impact situations could have been prevented if the key nodes would have been identified and better protected. The importance of nodes in the network can be determined by using centrality measures that have been widely used but also have some drawbacks. Each measure can work well for a certain case, but at the same time, each measure fails to obtain other structural characteristics of the network. The traditional centrality measures identify a node as being the "most central" one, but they focus on information from different perspectives. For example, if we are doing an analysis on the most popular nodes in a social network, prioritizing by betweenness centrality could be wrong; however, the same betweenness centrality measure used in the analysis of epidemics is the one that tells us which nodes should be immunized first and it seemed to be the most reliable one in this case. In other words, each measure attempts to capture the nodes' importance from a certain perspective.
A community is a subset of nodes that interact with each other more than the nodes outside that particular community [9]. The nodes within a community are more interconnected, with more edges between themselves and fewer edges with other nodes outside the community. Community detection is known to be a clustering problem and many algorithms for the detection of communities have been proposed in the literature [10]. The tendency of a node is to cluster under the law of common interests, a dynamic that leads to creating structures with specific properties.
In this paper, we investigate the important nodes in different networks by aligning the results of centrality measures with community detection methods-analyzing cases where network-level important nodes detected by different centrality measures are hubs of communities, and vice versa. The aim of our study is to determine if there is any relation between the various node centrality metrics and the community detection-based hubdominant nodes. The main contributions of the paper are as follows: (i) a comprehensive overview on the literature of node centrality metrics and community detection algorithms; (ii) several experiments on some real world networks in order to confirm that nodes having top centrality measures overlap with those obtained by the means of community detection. Analysis of results is based on centrality scores obtained by six well-known measures (i.e., degree centrality, betweeneess centrality, closeness centrality, percolation centrality, eigen centrality and page rank), computation of hub dominance in relation to communities detected and use of the susceptible-infected-recovered model to verify the information propagation in the network. Computational experiments confirm that centrality measures and community detection yield common important nodes.
The paper is structured as follows. Section 2 presents, in a formal manner, definitions of centrality criteria. Community detection methods for finding important nodes are reviewed in Section 3. Section 4 describes the details of our approach and the flow of the experiments, while Section 5 presents the computational experiments, datasets used and the results of the analysis. Finally, we draw our conclusions in Section 6, formulating possible future directions as well.

Node Importance Measures
The identification of central nodes in complex networks has an important theoretical significance for the structure, propagation and synchronization of complex networks. It has a very practical value for understanding the communication, control of information and transactions between nodes. The node's importance has a philosophical implication quantified as various centrality properties. All these properties have in common that they reveal the nodes with a significant impact on the information flow of the network-socalled central nodes, e.g., in a financial context, these central nodes can be large banks or large economic entities that proxy a significant number of transactions.
Degree centrality (the ratio between total number of neighbors of a node and the number of nodes in the graph [11]) assumes that if a node has a great number of adjacent nodes, it is very influential. This can be considered to be a measurement of importance, but it only considers the information locally, not taking into account the whole global network structure. Closeness centrality [12] is expressed by the inverse of the average length of the shortest paths between a node and all other ones in the network. Hence, the greater priority a node has, the closer it is to all different nodes in the network. It can also be considered a measurement of how easily the information passes from a node to others; a drawback of this method can be the disconnected components. Betweenness centrality [13] is expressed by the fraction of shortest paths between any two nodes that pass through this node. It atones for the deficiencies that degree and closeness centrality have. Similar to degree centrality, eigen centrality (EC) [14] is a measure for the influence of a node in a network based on the number of its neighbors. This centrality goes a step further by also taking into account how well connected a node is and how many links their connections have. By doing so, this algorithm can identify influential nodes throughout the whole network, not just for those directly connected to it. Another metric is page rank (PR) [15], derived from eigen centrality, also assigning nodes a score based on the number of total connections. The difference is that page rank algorithm also takes into consideration link direction and weight. Going further, K-shell decomposition [16] (the method where the nodes can be divided on the basis of the number of its degree-nodes with degree 1 in one bucket, etc.) has been used in quite a number of application as a tool for understanding the importance of nodes within a network structure. K-shell decomposition can be visualized as a way to understand the "goodness" of a node as distributor in a large-scale complex network. Roughly speaking, the bigger the k-shell index of a node is, the better such a node can act as a distributor in the network. The above described notions are summarized in Table 1. Table 1. Overview of node centrality metrics.

Metric Abbreviation Ref. Description
Degree centrality DC [11] Number of adjacent nodes, relative to the total nodes.
Betweenness centrality BC [13] Number of shortest paths passing through the node. Closeness centrality CC [12] Inverse of avg. distance to other nodes.
Eigen centrality EC [14] Relative degree centrality influenced by the score of the neighbors Page rank PR [15] Like EC, but accounts for directed networks too K-shell decomposition KS [16] Reveals core-periphery structure by organizing the nodes into buckets.

Community Detection Methods
In the network science literature, there are various definitions regarding the concept of community, all of them centered around the idea that nodes inside a community have larger number of connections than with the outside network [17]. Thus, community means a group of nodes (a set of nodes), with the following properties: they form a connected subgraph (there exists a path from each node to every other node); nodes in community A have common properties, based on some particular similarity measurements (topical and topological features); nodes in community A have more common properties with other nodes in A than with those found in community B-the interior link occurs with higher probability compared to an outer link. Consequently, the problem of community detection consists in finding a partition of the node set N, resulting in a set of communities: criteria is not always required, i.e., in case of overlapping communities).
Defining the community structure of the network means in fact to determine a partition of the nodes. Since the the number of such partitions grows exponentially in the number of nodes, the literature of community detection is focused on the complexity of the algorithms applied. A recent survey paper [18] offers a valuable overview on the field of community detection. According to [18], the complexity of detecting communities depends on the nature of communities (static, dynamic, overlapping), as well as the network's topology (weighted, and/or directed graphs). One could consider the problem as a clustering of the nodes, based on similarity measures in the network [19]; or it may be worth trying to ignore the edge direction as well [20]; or one could focus on determining leader nodes based on centrality measures and build the communities around the leaders [21]. We summarize the state-of-the-art methods in community detection.
Generally, very little information is known in advance about a network's community structure. Thus, a traditional method is to assume that the network has hierarchical topology (such as a social network for example), and determine a multilevel division of the network-in this case, hierarchical clustering methods can be used. Hierarchical clustering is based on a similarity matrix, containing node pair similarities. We can find and group together similar nodes in two different manners: join the nodes (agglomerative) or separate them (divisive). The agglomerative, bottom-up approach starts with all nodes being in a separate community, and continues to merge the most similar communities, based on the cluster-similarity measure, i.e., linkage type (single-, complete-, average-linkage). One such example is the Ravasz algorithm [22], with a complexity of O(n 2 )-similarity matrix calculation for hierarchical methods is expensive, in general. The divisive, top-down approach works in a reverse manner: it starts with all nodes being in one community, and repeatedly removes the highest centrality links until no link remains. Since the links are removed according to highest centrality, it is expected that high centrality links connect nodes from different communities, low centrality ones link nodes in the same group. This can be accomplished with several metrics: link betweenness centrality; random walk betweenness-the probability that the link l i,j was passed by a random walker in a path from n i to n j . Since this metric is computationally expensive (O(n 3 ), [2]), link betweenness is preferred as divisive method. A representative divisive method is the one proposed by Girvan and Newman in 2002 [23]. The algorithm uses the edge betweenness centrality measure to repeatedly remove the highest centrality links. The algorithm's complexity depends on the centrality measure, which is O(ln) (l denoting the number of edges). Removing all the links means an iteration over the links, resulting in O(l 2 n)-for sparse (l ≈ n) network: O(n 3 ).
Another approach is to use an agent, which performs random walk in the network. One such algorithm is the Walktrap, which essentially is a agglomerative method, building on the node similarity matrix based on random walks. Since a similarity matrix update is needed in every iteration, the complexity is O((l + n) 2 n), or O(n 3 ) in sparse networks [2,24]. Another key example is the Infomap [25], exploiting the idea that a random walker tends to be blocked into a group of densely connected nodes (community), before it exits to another group of nodes. The task is to minimize the length of the Huffman encoded path of the walker by assigning the groups a prefix and reusing the codes for the nodes within the communities. Similar to another algorithm (i.e., Louvain), nodes are repeatedly moved from one community to another, based on a target cost function-while Louvain uses modularity; Infomap uses the length of the Huffman code of the random path. The complexity is O(l log l) [2].
Another traditionally accepted approach is to maximize the basic community evaluation metric, the partition's modularity. Communities' modularity is an intuitive and popular metric, which can measure how well the nodes are grouped together. Some algorithms are based on the idea to maximize the overall modularity of the communities. The FastGreedy algorithm builds on the maximum modularity hypothesis: the setup that results in the highest modularity score is the optimal community structure of the network. The algorithm starts with each node forming its own community. In each iteration, for each community pair, the change in modularity is calculated if the two communities were merged (resulting an agglomerative method). The communities that result in the highest improvement of modularity are merged. The complexity is O((l + n)n) [2]. Clauset-Newman-Moore (CNM) [26] applies the same logic as Fast-Greedy, but with some special data structures achieves complexity of O(ldlogn), d-the depth of the dendrogram. The Louvain algorithm [27] is a preferable option for large networks, due to its O(l) complexity. It starts with different node being its own commu-nity, and the concept is to place a node n i to one of its neighboring node's community, in a way to maximize the modularity change. Depending on the implementation details (https://sites.google.com/site/findcommunities/, accessed on 12 July 2021), the complexity is at most O(llogl) (or O(nlogn) for sparse networks) [2]. Probabilistic approaches have been proved to be successful in optimizing the modularity of the network [17,[28][29][30].
The label propagation algorithm (LPA) was first published in 2007 [31], announcing an efficient community detection method with complexity of O(ldlogn), d meaning the depth of the dendrogram. The method starts with each node forming its own community. In every iteration, each node takes the label of their neighbors, based on majority vote. The paper compares the synchronous and asynchronous method of node label updating. In order to avoid oscillation of labels and ensure the termination of the algorithm, the authors propose asynchronous label update-one node at a time. The algorithm stops when no more node label update is needed-each node having the majority vote of its neighbors. Other papers improved the accuracy of the initial LPA by applying heuristic on the propagation, based on local edge betweenness [32,33]. The article from 2019 [33] uses this improved method on weighted graphs, calculating local edge betweenness [34] to prioritize the propagation of labels, reaching a complexity of O(n( l n ) h ) (using h = 2 depth edge betweenness), meaning near linearity in the nodes for sparse networks. Edge betweenness is used as a heuristic to prioritize the selection of low score edges-the concept is that high edge betweenness links tend link nodes of different communities.
All of the algorithms mentioned above are applicable for simple graphs, some of them-with necessary adjustments-capable of solving the community detection problem for weighted and/or directed networks. Detailed benchmark results are described in [10], a valuable article from which one key point is that all modularity-based approaches can be extended to weighted and/or directed networks as well, by modifying the formula of modularity [17]. In the case of LPA, link weight and directionality can also be taken into consideration for weighting the vote of each neighboring node, depending on the implementation and practical use case. Random walk methods can be also adapted to directed networks. For example, Infomap can be applied to directed and weighted networks too [10].
It is important to note that community detection algorithms are not all deterministic. In fact, any method based on a random walker or probabilistic optimization are nondeterministic, e.g., Infomap, WalkTrap, Louvain and LPA; however, the GN method or its improved version CNM, are deterministic algorithms. The algorithm's complexity is also a key factor determining their application during the experiments.
Regarding the evaluation of the partitions, along the strictly topological properties, the quality of a partition can be also measured by considering the ground true labels of the nodes (i.e., which community they belong to). For this purpose, metrics from information theory can be used (Shannon entropy, mutual information, homogeneity, completeness). Modularity of a partition [35] is a ratio in [−1, 1] [36], measuring the difference between the density of links of the community, compared to the expected density of a randomized network's partition. Coverage and performance of partition [17] measures how dense the intra-community edges are, and how well the communities separate non-linked nodes. Embeddedness of node is ratio in [0, 1], measuring how many of the node's neighbors are in the same community [37]. Its formula, e i = k in i k i , underlines this metric importance as node ranking method within a community. Hub dominance of community C is a ratio in [0, 1], measuring the existence of a hub node, connecting to as many other nodes in C as possible [37]. Its formula, h C = max(k in ) |C|−1 , suggests that this metric can be used as a node importance criteria after the community structure of the network is determined. Each node has a hub dominance score within a community, this number showing the extent to which this node represents the community-the node having the maximum of these scores will be the hub-dominant node of the community.

Proposed Analysis of the Important Nodes
This paper investigates the important nodes in a network by combining insights offered by centrality measures and community detection methods. After we determine the ideal partitioning of the nodes according to different community detection algorithms, we match the previously calculated centrality measures on the hub-dominant nodes of each community. We want to investigate that some of these metrics yield the highest score across the entire network, in the case of these hub nodes.
Our experimental analysis aims to respond to a list of research questions. These research questions serve as a guidelines for the reader in understanding and following the study's goals with respect to methodology steps. Considering these aspects, we formulate the following research questions.
• (RQ1) Analysis of important nodes through centrality metrics: what is the best way to find a network's important nodes? • (RQ2) Analysis of important nodes from the perspective of the community structure of the network: do the community-based node metrics yield useful node ranking? • (RQ3) Is there any correlation between the various approaches of node centrality metrics and community-based node importance?
The first two research questions target the analysis of a network from two independent standpoints. The third research question, however, is the most important aspect of this study, since it investigates a possible improvement of the node importance ranking strategies by discovering a relation between the above mentioned methods.
In order to address RQ1, we apply the aforementioned centrality algorithms: degree, closeness, betweenness, eigenvector, percolation, page rank centralities and k-shell decomposition in order to find the most important nodes on the networks that we are studying. In order to address RQ2, we analyze networks by determining their community structure with the following community detection algorithms: Greedy modularity-based algorithm [26]; LPA [31]; Louvain [27]; Infomap [25]; Infomap', which is initialized with the result partition of LPA. Since we lack the ground truth partitioning of the nodes in the datasets, we focus on strictly topological evaluation in our experiments-calculating the most intuitive metrics regarding the accuracy of community detection: modularity, coverage and performance. Further, we compare the number of detected communities, and their sizes. Furthermore, we analyze the community structure by determining the hub-dominant nodes, and the embeddedness of the nodes-focusing on their distribution.
The proposed approach to address RQ3 is to collect the important nodes obtained by community detection, and examine whether they are also important nodes according to some node centrality criteria. We executed these steps on some real, public networks, and on some generated graphs as well, according to the existing network benchmarks.

Experiments and Results
This section presents the experiments for several real-world, publicly available networks in order to analyze the important nodes. The following abbreviations are used for the measures computed: DC-Degree Centrality, BC-Betweeneess Centrality, CC-Closeness Centrality, PC-Percolation Centrality, EC-Eigen Centrality, PR-Page Rank, HD-Hub Dominance. Table 2 presents the networks (http://networkrepository.com, accessed on 12 July 2021) used for our experiments: Karate club network, Dolphin network, Politicians' Facebook pages network and Google+ social network. The motivation for the selection of these networks is to offer analysis on different sized datasets. Furthermore, they are frequently used in the specialized literature. During the first part of our experiments, we aim to inspect the results of the node centrality metrics and community detection algorithms for the real-world networks presented in Table 2. Then, we confirm our intuitions gained through these experiments by running the experiments on randomly generated networks, respecting the Lancichinetti-Fortunato-Radicchi (LFR) benchmark [38] and the Erdős-Rényi (ER) network model [39].

Analysis Based on Centrality Scores
We show the first five nodes, according to their centrality ranking, in Tables 3 and 4all of these nodes are in the K-shell core layer (K-shell is a method for dividing the nodes into two big categories: core-periphery; the ones in the core layer can be considered the most important ones). We have chosen to present only the first five nodes for each network due to the fact it is easier to see the correlation between the nodes of each centrality.
In the Karate network, we can observe that the majority of our methods found nodes 1, 34 and 3 as having the best centrality score. When analyzing the Dolphin network, a slightly bigger one, we see that again a couple of nodes are the ones with best scores: 15, 37 and 38. The Politicians network seems to have a better diversity when it comes to the top 5 nodes. This aspect is due to the fact that the network is way larger than the first two. The last network used in the experiments comes to strengthen this point of view. As a conclusion, we can observe that in small networks, high centrality score nodes are repeating, while for larger networks, it no longer happens and diversity starts to appear.  First node  34  1  1  1  34  34  15  37  37  37  15  15  Second node  1  34  3  34  1  1  38  2  41  2  38  18  Third node  33  33  34  33  3  33  46  41  38  41  46  52  Fourth node  3  3  32  3  33  3  52  38  21  38  34  58  Fifth node  2  32  9  32  2  2  34  8  15  8  51 38

Analysis Based on Community Detection
During the community detection experiments, our attention is also focused on the complexity of the algorithms, since in a real world scenario where a network could have millions of edges, any algorithm quadratic in n (especially, in l) could be impractical to use. According to the Girvan-Newman and Lancichinetti-Fortunato-Radicchi benchmarks [40], the Infomap and Louvain algorithms have the best performance regarding running time and result accuracy. The Louvain and LPA algorithms are also built-in algorithms in the Neo4j graph database engine-being the first production ready algorithms of the engine (https://neo4j.com/docs/graph-algorithms/current/algorithms/community/, accessed on 12 July 2021).
The results of different community detection algorithms on the Politician dataset are shown in Figure 1. It is interesting to analyze the subtle differences between these communities. The colors used for plot correspond to the community ID. It is important that visible structural differences can be observed, even on this moderately small network. CNM seems to yield a good community structure, however, the central red region is dispersed, which could be split into three different communities (this may be explained by the nature of the algorithm to merge two clusters-adjusting the parameter of the dendrogram cut will influence the final community structure). Furthermore, there are some outlier nodes, e.g., the red node in the leftmost blue region. The LPA solves the separation of the aforementioned central red community, but, there are cases where such separation may not be justified, e.g., the leftmost region, or the top right green and blue communities-again, this may be fine-tuned by varying the voting algorithm. Infomap yields the smallest number of communities, which can be seen on the plot, too, having many separate groups with the same color, i.e., community ID-the explanation may lie in the fact that the for small communities, the length of the Huffman coded path will not be any smaller by splitting these groups (the size of the communities together does not result in longer Huffman codes per node), on the contrary, it may become longer since a new entrance and exit code must be added to the path. Finally, Louvain communities seem to capture the network's community structure in the best possible way, with clear separations, but not in a excessive manner, as in the many cases of LPA.    The numerical statistics regarding the number of communities, their sizes, densities and topological properties are shown in Figure 4. Here we show several bar plots and line series. Each bar plot refers to the different community detection algorithm applied on the network: from left to right, respectively, CNM, LPA, Louvain, Infomap, and Infomap with a custom initial partition, that of the output of LPA. The algorithm's theoretic complexity is reflected in the first plot: CNM is by far the most time consuming, while LPA, Louvain and Infomap are the best in this regard. Also, the number of communities is way larger in case of the LPA compared to others-this can also be observed in their plots from Figure 1. Regarding modularity, coverage and performance scores, we can state that the custom partition-based Infomap does not outperform the simple one-quite on the contrary. The second row of plots show two line plots, each having five series according to the different algorithms, and two other plots to represent the community size histogram of Louvain. The first line series show max., average, median and min. size of the communities, while the second plot shows the same stats referring to their densities. We can observe that LPA produces the most dense communities, which can be justified by the larger number of communities, a more thorough separation of the nodes. These values originate from one evaluation-although these algorithms are non-deterministic, except the CNM, results across different runs showed insignificant, slightly observable differences. We list the hub-dominant nodes of each community according to each algorithm in Table 5 for the smallest networks and in Table 6 for the Politicians network. We also analyzed the distribution of node embeddedness and hub dominance. Figure 5 shows a scatter plot of the community sizes, scaled by their hub dominance score. It can be observed that the smaller the community, the larger the hub dominance, naturally. Cases where a large community has a large hub dominance are interesting scenarios, probably meaning the existence of important nodes inside the community. Similarly, the communities' node embeddedness histogram can be seen on the rightmost plot, demonstrating that this network's community structure results in highly embedded nodes, their degrees consisting of mostly inner connections of the community.

Fusion of Node Centralities and Louvain Communities
We conclude our experiments by a fusion of the results yielded by Louvain (due to its ideal modularity structure, as depicted in Figure 1), and those obtained by calculating different node-level centrality metrics. Figure 6 shows Louvain communities of the Politicians network, nodes scaled according to their degree, and for each centrality metric, the top 20 nodes are extracted, their scores being scaled by the value of the respective metric. By visual analysis, one can observe that a series of centrality measures yield overlapping node sets. For example, the central orange group of nodes, with high degree nodes on its periphery, is detected by betweenness, closeness, page rank, eigen centralities as top 20 from the entire network. Furthermore, one of them is exactly the hub in the respective community-having hub dominance score of 0.6222. Another example refers to the central green node, being the hub-dominant node of that green community, which on the other hand is detected by all other centrality metrics, having outstanding closeness and page rank scores. These observations are formulated numerically as well, in Table 7. In this table, we list those hub-dominant nodes of the respective network, which appear in the top 20 centralities-listing their rank in parenthesis (i.e., 1 meaning that the node has the highest centrality score in the network). Table 7 indicates that the hub-dominant nodes in the Karate network are nodes 1, 6, 34 and 25-nodes 1 and 34 having top centrality metrics as well (see Section 5.2). Their hub-dominance score is also among the highest of these four hub nodes. It is interesting to notice that although node 25 is a hub node, its dominance score is only 0.6 (compared to 0.9 of node 1), and not any of the centrality measures ranks this node as top important one-similarly to the case of node 6.   Furthermore, the susceptible-infected-recovered (SIR) model [41] could be engaged to investigate the spreading dynamics in the network considering the important nodes determined by the previous centrality and community based approaches. The propagation ability of a node could be representative to be studied in this context. In Figure 7 we compare the evolution in time of the information spreading for the node having highest betweenness centrality score (left) and first node for hub dominance (right). The green nodes are the ones not containing any information (susceptible), red containing information (infected) and grey are the nodes that had information, but managed to recover from it (recovered). The graphs attached to each frame are showing in time the evolution of each type of node (green, red and gray). Figure 7 only depicts the first iterations to illustrate how the information spreads from the initial infected nodes to the other nodes. These empirical results have been obtained for a particular SIR model with the probability of 0.2 for an infected node to infect its neighbors and a recovery rate of 1.2. We can note that, if we compare every frame between the two nodes, they are both confirmed as important nodes in this small network; however, a deep statistical validation is still required in order to assess the importance of high ranked centrality nodes and hub dominant ones, further taking into account the stochastic nature of the spreading models.

Benchmark Testing on Lancichinetti-Fortunato-Radicchi and Erdős-Rényi Networks
In order to validate our hypothesis of the correlation between hub-dominant nodes and important nodes having top centrality metrics, we carried out experiments on different sized LFR networks [38] and ER networks [39]-following the logical flow of the above described analysis, but now on much more randomly generated networks. The steps are as follows: rank the nodes according to centrality metrics, determine the community structure of the network with Louvain algorithm, then determine the hub-dominant nodes of each community and finally, verify if these hub nodes are among top centrality nodes according to any of the mentioned centrality criteria. We mention that the LFR networks already have the planted partition structure. In order to show consistent results across the two random network generation models, regardless of the underlying partitioning method, we use the Louvain partitioning of the LFR networks (instead of their planted partitions), just as in the case of ER networks.
The importance of using both of these network models is to address RQ3. Since ER networks do not have an intrinsic community structure-as stated by the random hypothesis, also described in [2], the joint results of the experiments run on LFR and ER models do confirm that hub-dominant nodes tend to be among top important nodes, especially on lightweight networks. The importance of the random hypothesis in this context is that regardless of the networks's nature of having a inherent partition or not, running the Louvain algorithm, the obtained hub nodes will be among those having highest centrality metrics.
The LFR networks were generated by using the following parameters: degree distribution's power-law exponent 3, community-size distribution's power-law exponent 1.1, fraction of inter-community edges incident to each node 0.1. The degree intervals and minimum number of communities (c min ) are shown in Table 8. The ER networks were generated using the G(n, p) random model, with n of 10, 50 and 100, and p being 0.3 and 0.8.  Tables 9 and 10 show the results of this experiment, according to four different sized network categories. Each of these shows the percentage of hub-dominant nodes being among top k nodes according to the respective centrality criteria. The process of finding this percentage is described in the following. For each of these four network categories, several instances were generated-each instance having a set of hub-dominant nodes, and the ranking of important nodes. The respective percentage is calculated by taking into account all of these network instances, and determining how many of these instances' hub nodes were among the corresponding top ranking-separately, for top 1-3, top 4-6 and top 7-10 nodes. Due to the computational complexity of some node centrality metrics (e.g., betweenness centrality), for larger networks only a small number of instances could be evaluated-i.e., only 7 networks of size n = 500. Thus, results corresponding to larger networks may be biased, but we can still formulate our observation, based on the comparison with smaller networks.
Although some of the experiments were run on statistically too few networks in the case of LFR model, the Erdős-Rényi experiment was carried out on a statistically representative sample size-1000 networks for each parametrizations. We may conclude that for small networks, the hub-dominant nodes are among the top important nodes of the network, but as the network grows, these hubs do not necessarily have top centrality metrics-hence, a comprehensive analysis of the network should not be limited to the different node-level centrality metrics, but shall keep in mind the network's community partitioning as well.

Final Remarks
According to the above discussed results, we can formulate the following responses to the aforementioned research questions. The different node centrality metrics select different nodes to be important, no predefined method can be selected as the rightmost criteria and the decision must be made based on the application context.
Community detection helps the analysis in several ways. As some of the prevalent algorithms' complexity is close to linearity, this kind of analysis may be more efficient than determining node importance in the global context of the network. Furthermore, both the combined plots of the communities and centralities, the tabular data about the fusion of hub-dominant nodes and top centrality scores and the benchmark tests as well suggest that there is a relation between the community-based important nodes (i.e., hubdominant nodes) and several centrality criteria; however, only in lightweight networks. The observation can be motivated with the fact that as a network grows, a simple node partitioning may not have full vision on the structure of the network, thus, node centrality metrics based on full-graph traversals and paths yield other important nodes than by selecting just the representative items of each community.

Conclusions and Future Work
This paper analyzes different methods for node importance ranking using centrality scores and community detection, then investigates the relation of these methods by carrying out multiple experiments on both real and synthetic networks. In a complex network, each node centrality metric offers a different perspective on the dataset and the most precise analysis can be obtained by taking into consideration not only a multitude of these metrics, but also the community structure of the whole network. In order to verify this claim, several experiments have been carried out on four real world networks and different sized random networks, and the top important nodes have been examined. The results from centrality measures and community detection show that these methods yield common important nodes and further suggest that the relation between hub-dominant nodes determined through community detection and highly-ranked centrality nodes is valid particularly for lightweight networks.
The proposed approach is relevant to undirected and unweighted networks as well. Since the experimental results confirm that there are some important nodes that are found by both measurements, future studies are needed to focus on an aggregate approach of dynamic characteristics and network structure in order to find the node significance. By determining node chaining patterns, one could obtain interesting insights to the nature of the network. These patterns may be the subject of further network motif analysis or graph convolutional neural network application.