Leveraging Minimum Nodes for Optimum Key Player Identiﬁcation in Complex Networks: A Deep Reinforcement Learning Strategy with Structured Reward Shaping

: The problem of ﬁnding key players in a graph, also known as network dismantling, or network disintegration, aims to ﬁnd an optimal removal sequence of nodes (edges, substructures) through a certain algorithm, ultimately causing functional indicators such as the largest connected component (GCC) or network pair connectivity in the graph to rapidly decline. As a typical NP-hard problem on graphs, recent methods based on reinforcement learning and graph representation learning have effectively solved such problems. However, existing reinforcement-learning-based key-player-identiﬁcation algorithms often need to remove too many nodes in order to achieve the optimal effect when removing the remaining network until no connected edges remain. The use of a minimum number of nodes while maintaining or surpassing the performance of existing methods is a worthwhile research problem. To this end, a novel algorithm called MiniKey was proposed to tackle such challenges, which employs a speciﬁc deep Q-network architecture for reinforcement learning, a novel reward-shaping mechanism based on network functional indicators, and the graph-embedding technique GraphSage to transform network nodes into latent representations. Additionally, a technique dubbed ‘virtual node technology’ is integrated to grasp the overarching feature representation of the whole network. This innovative algorithm can be effectively trained on small-scale simulated graphs while also being scalable to large-scale real-world networks. Importantly, experiments from both six simulated datasets and six real-world datasets demonstrates that MiniKey can achieve optimal performance, striking a perfect balance between the effectiveness of key node identiﬁcation and the minimization of the number of nodes that is utilized, which holds potential for real-world applications such as curbing misinformation spread in social networks, optimizing trafﬁc in transportation systems, and identifying key targets in biological networks for targeted interventions.


Introduction
Complex networks hold substantial significance given their extensive reach and impact on diverse aspects of our lives. At the heart of complex networks lie the key players, also known as influential players [1], vital players [2], or critical players [3]. These players represent certain nodes, edges, or substructures that, when removed, can substantially degrade a network's specific functionality [4]. The importance of identifying these key players has profound implications in a variety of domains, such as epidemic control [5], drug design [6], viral marketing [7], criminal networks analysis [8] and combat network disintegration [9].
To provide a vivid illustration of the effectiveness of MiniKey, we showcase its application in the domain of crime networks in Figure 1. This figure depicts the efficient performance of MiniKey, clearly demonstrating how it outperforms other methods, maintaining network connectivity while minimizing the number of utilized nodes. In a direct comparison using the 9/11 terrorist network as an example, MiniKey outperforms the original FINDER model, preserving network connectivity by using fewer nodes, which represents a significant step forward in this research area. This demonstration of MiniKey's superior performance emphasizes its potential in solving real-world problems, particularly in maintaining optimal network functionality while minimizing node usage. objective optimization field, which can navigate the balance between functionality optimization and minimal node usage. By harnessing the power of DRL and reward shaping, it ensures a more efficient and robust learning process, resulting in the identification of fewer key players while still maintaining the network's optimal functionality.
To provide a vivid illustration of the effectiveness of MiniKey, we showcase its application in the domain of crime networks in Figure 1. This figure depicts the efficient performance of MiniKey, clearly demonstrating how it outperforms other methods, maintaining network connectivity while minimizing the number of utilized nodes. In a direct comparison using the 9/11 terrorist network as an example, MiniKey outperforms the original FINDER model, preserving network connectivity by using fewer nodes, which represents a significant step forward in this research area. This demonstration of Mini-Key's superior performance emphasizes its potential in solving real-world problems, particularly in maintaining optimal network functionality while minimizing node usage. Figure 1. Leveraging minimum nodes for optimum key players' identification in complex networks. (a) The 9/11 terrorist network with 62 terrorists (nodes) and 159 relations (edges). The size of a node corresponds to its degree. (b): HDA algorithm adaptively removes 28 nodes (grey); only 3 edges remain in the network, but this algorithm has a poor ANC score. (c): FINDER algorithm removes the same number of nodes as HDA, and 6 edges remain in the network. (d): MiniKey algorithm removes the same number of nodes as HDA; only 3 edges remain in the network and an optimal ANC score is achieved.

Accumulated Normalize Connectivity (ANC)
Given a graph ( , ) G V E with vertices set V , edge set E and a connectivity metric σ , where each edge E is a pair of distinct vertices ( , ) u v V V ∈ × , the learning target in key players' identification is to find a optimal node removal sequence 1 2 ( , , ) which minimizes the following accumulated normalize connectivity (ANC): (a) The 9/11 terrorist network with 62 terrorists (nodes) and 159 relations (edges). The size of a node corresponds to its degree. (b): HDA algorithm adaptively removes 28 nodes (grey); only 3 edges remain in the network, but this algorithm has a poor ANC score. (c): FINDER algorithm removes the same number of nodes as HDA, and 6 edges remain in the network. (d): MiniKey algorithm removes the same number of nodes as HDA; only 3 edges remain in the network and an optimal ANC score is achieved.

Accumulated Normalize Connectivity (ANC)
Given a graph G(V, E) with vertices set V, edge set E and a connectivity metric σ, where each edge E is a pair of distinct vertices (u, v) ∈ V × V, the learning target in key players' identification is to find a optimal node removal sequence (v 1 , v 2 , · · · v n ) which minimizes the following accumulated normalize connectivity (ANC): In this paper, connectivity metric σ represents the size of the largest connected component remaining in the network, in which case the problem is referred to as the network dismantling problem (NC) [31]. This problem involves the identification of a sequence of node removals that minimizes the size of the largest connected component remaining in the network. By addressing this issue, we aim to offer new insights into the process of network disintegration and propose more efficient strategies for identifying key players in complex networks.
It is worth noting that the connectivity metric σ can also represent various other network characteristics. For instance, it could represent the sum of the connectivity between nodes, referred to as the critical node problem (CN) [32]. It could also indicate the average geodesic distance [33] in the network among other possible measures.

Solution Length Ratio (SLR)
Given a graph G(V, E), we define the solution length ratio (SLR) under a certain strategy as the ratio of the total number of nodes needed to disintegrate the network until no edges remain to the total number of nodes in the original network. This can be mathematically expressed as follows: Here, |V S | is the number of nodes required to dismantle the network until no edges remain in the network; |V| is the number of nodes in the original network G(V, E). The importance of the SLR lies in its ability to measure the resource efficiency of a networkdisintegration strategy. In practical scenarios, it is often preferable to achieve the network disintegration with as few node removals as possible due to resource constraints. Hence, a strategy with a lower SLR would be more desirable as it indicates less resource consumption in terms of node removals. When optimizing based solely on this metric, the problem can be transformed into the minimum vertex cover problem (MVC) [34].

Pareto Frontier
In the field of multi-objective optimization, the pareto frontier is characterized as the assortment of solutions (or, correspondingly, points within the objective function space) for which no other viable solution can be found that would decrease any given criterion without concurrently instigating an increase in at least one additional criterion.
Formally, for a problem with objectives to minimize, a solution X is said to dominate another solution Y if, and only if: (1) For all objectives i, 1 ≤ i ≤ k, the score of X on i is less than or equal to the score of Y on i; (2) There exists at least one objective j, 1 ≤ j ≤ k, such that the score of X on j is strictly less than the score of Y on j.

Model of MiniKey
Here, we employ a typical encoder-decoder architecture to model the key node identification problem on graphs. Simultaneously, we utilize a reinforcement learning algorithm with structured reward-shaping to train the entire MiniKey model. The design of each part of the algorithm is as follows.

Encoding Process of MiniKey
The identification of key players in complex networks heavily depends on the feature representation of elements within the network. Therefore, an effective feature-learning model can enhance the performance of the algorithm. In this work, we employ Graph-Sage [35] to encode the nodes within the network into a latent representation. Furthermore, we utilize a technique known as virtual node technology to capture the overall feature representation of the entire network. The detailed algorithmic framework of our graph encoding process is outlined as follows.
Algorithm 1 is the encoding process of MiniKey; given a graph G(V, E), node features X v , depth K, and learnable weight parameters W 1 ∈ R c×p , W 2 ∈ R p×(p/2) , W 3 ∈ R p×(p/2) , the purpose of this algorithm is to produce node embeddings z v and graph embedding z s .

Algorithm 1: Encoding process of MiniKey
The algorithm starts by adding a virtual node s that connects to all nodes in the graph.
This node is referred to as the graph state. The embeddings h (0) v for the nodes and the virtual node are then initialized. The function Rectified Linear Unit (ReLU) is applied to the dot product of the node features X v and the weight parameters W 1 , and the result is then normalized using the L2 norm. The algorithm then enters a loop that runs K times. This loop corresponds to the K layers of the graph neural network. In each iteration, it calculates the new embeddings for each node and the virtual node. Inside this loop, there is a nested loop that runs for every node v in the graph and the virtual node. For each node v, the embedding h

Decoding Process of MiniKey
We use a two-layered MLPs to decode a state-action pair (s, a) to a scalar value Q(s, a) that predicts the maximal rewards after taking action a in a given state s, which is defined as shown below: Here, W 4 ∈ R p×1 , W 5 ∈ R p×1 , are learnable weight parameters, z s and z a ∈ R 1×p are the state-embedding (Graph) and action-embedding, respectively, which are produced by Algorithm 1.

Training Algorithm of MiniKey
To leverage a minimum number nodes to find the optimum key players sets in complex networks, we formulate this problem as a Markov Decision Process (MDP) in a graph. The reinforcement learning components of MiniKey are outlined below: States: The state, denoted as S in MiniKey, is the current configuration of the network graph G. It represents the partial solution to the key player identification problem. Each state is represented as a vector in a p-dimensional space, which is defined in Algorithm 1 as z s .
Transitions: The transition function in MiniKey is deterministic and corresponds to the selection of a node v ∈ G that is not yet part of the state S.
Actions: An action in MiniKey corresponds to the selection of a node v that is not currently part of the state S. The node is also represented by its p-dimensional embedding as z v in Algorithm 1.
Rewards: The reward r(S, v) in MiniKey can be be defined as ANC, i.e., the increase in ANC after selecting node v k as the action and transitioning to the new state S = (S, v k ).Therefore, it can be expressed as: The cumulative reward of a terminal stateŜ aligns with the cost function value ofŜ: In addition to considering the original ANC as the objective of reinforcement learning, we also incorporate an additional structural penalty as a constraint for the agent. The learning objective of MiniKey is to simultaneously optimize ANC and SLR, that is, to identify the optimal key nodes with as few nodes as possible, which could lead to multiobjective reinforcement learning [36,37].
Inspired by the network dismantling algorithm CoreHD [38], we observed that, for the network dismantling problem, star-shaped networks are more optimal for disintegration. In such cases, removing only the central node can disconnect many edges. Conversely, if there are too many nodes with degrees 1 or 2 in the largest connected component, a large number of nodes will need to be consumed in order to ensure that there are no remaining edges in the network. Nodes with a degree of 1 in a network are usually leaf nodes, and nodes with a degree of 2 are typically located in chains and cycles; the more of these two types of nodes there are, the more nodes need to be removed to ensure no edges exist in the network. In the extreme case of a chain of length N, to ensure that there are no edges, it would be necessary to remove N-1 nodes.
Based on these observations, we modified our reward function to include a penalty term, specifically, the number of nodes in the largest connected component of the remaining network with degrees 1 or 2. This design intends to minimize the presence of nodes with degrees 1 or 2 in the largest connected component while optimizing network dismantling. Experiments demonstrate that this crucial reward function design enables our MiniKey framework to identify the key nodes in the network with fewer nodes.
As illustrated in Equation (6), R tot (v 1 , v 2 , · · · v n ) is the total reward of MiniKey, R(v 1 , v 2 , · · · v n ) is the original reward of NC, and R penalty (v 1 , v 2 , · · · v n ) is the penalty term, where |N| LCC 1,2 (G\{v 1 , v 2 , · · · , v k }) is the number of nodes with degrees of 1 or 2 in the largest connected component (LCC) of the left graph.
It is worth noting that the learning objective of MiniKey is to minimize the reward function R tot (v 1 , v 2 , · · · v n ), and the original reward function R(v 1 , v 2 , · · · v n ) will inevitably Mathematics 2023, 11, 3690 7 of 13 monotonically decrease as the network scale decreases. At this time, the optimal effect of the reward function R tot (v 1 , v 2 , · · · v n ) after reward-shaping can only be achieved when the structural penalty reward R penalty (v 1 , v 2 , · · · v n ) also decreases. That is, while reducing the number of nodes with a degree of 1 or 2 in the largest connected component, the size of the largest connected component in the remaining network decreases the fastest.
Policy: The policy in MiniKey is based on the approximated Q-function,Q. A deterministic greedy policy π(v|S) = argmax v ∈SQ (z s , v ) is applied. When action v is taken, a node from G is added to the current partial solution, leading to a reward r(S, v) = Q(s, a) which is defined in Equation (3).
Based on the aforementioned modeling, we use the DQN [39] algorithm to train MiniKey using simulated Barabási-Albert (BA) graphs as training samples. DQN is a variant of reinforcement learning where Q-Learning is combined with deep neural networks. This objective can be mathematically defined by the following cost function: In Equation (7), s represents the current state, a is the action taken, r is the reward received, and s is the new state after taking action a. D is the experience replay memory, Q(s, a; θ) is the Q-value function approximated by the network with parameters θ, and γ is the discount factor.

Results on Synthetic Graphs
During the training phase, we employed the Barabási Albert model (BA) with a default parameter setting: m = 4 (the number of edges attached from a new node to existing ones), and node number is uniformly chosen from the range [30,50] (indicating that the node count varies between 30 and 50). All experiments were conducted on a platform equipped from Huawei Cloud with a Nvidia GeForce Tesla V100-32GB GPU.
In the testing phase, synthetic graphs were generated, maintaining m = 4, but with varying node numbers divided into scales: 30-50, 50-100, 100-200, 200-300, 300-400, and 400-500. For each node size category, we created 100 test graphs. Subsequently, we gauged the performance of several algorithms, namely, HDA [12], HBA [14], HCA [40], HPRA [15], FINDER [30], and MiniKey, on these generated datasets. This methodology ensured a comprehensive evaluation, permitting a detailed comparison of the methods across different graph sizes. Figure 2 provides a comprehensive overview of the average performance (ANC and SLR) of various methods on these synthetic datasets. Importantly, we highlighted the error bars in the figure to depict the standard deviations of each method across the 100 test graphs. From Figure 2a, which shows the ANC results, we can observe that the MiniKey method generally outperforms the other techniques across all graph sizes. It achieves the lowest ANC in all categories except for the 30-50 and 50-100 range, where it is narrowly beaten by HBA. This indicates that MiniKey is more effective at identifying key nodes within the networks, as a lower ANC indicates a more effective disintegration of the largest connected component. The performance of MiniKey is especially noticeable in larger graph sizes (200-500), where it consistently outperforms the other methods. Figure 2b presents the SLR results across different graph sizes. The SLR measures the ratio of the nodes used by an algorithm to break the network completely. Here, MiniKey demonstrates an impressive performance again, consistently using fewer nodes compared to the majority of other methods across all graph sizes. This performance indicates a higher efficiency in utilizing network nodes for MiniKey. However, it is noteworthy that HDA method also showcases a competitive performance, with SLR closely following MiniKey's.  MiniKey has the best performance when considering both ANC and SLR metrics simultaneously.

Results on Real-World Networks
During the testing phase on real-world graphs, we used the best model, trained on simulated BA graphs (with node range 30-50 and m = 4), as the default parameters for MiniKey. We selected six real datasets from SNAP Datasets [41]: Crime, HI-II-14, Digg, Enron, Gnutella31, and Facebook. These datasets cover a wide range of fields, including criminal networks, biological networks, social networks, and communication networks. Table 1 presents the details of the network structures of these datasets, including node Moveover, according to Figure 3, which illustrates the Pareto Front comparison of different methods on six simulated datasets, it is evident that the competitive performance of MiniKey in both identifying critical nodes within the network and efficiently using nodes to break down the network.  MiniKey has the best performance when considering both ANC and SLR metrics simultaneously.

Results on Real-World Networks
During the testing phase on real-world graphs, we used the best model, trained on simulated BA graphs (with node range 30-50 and m = 4), as the default parameters for MiniKey. We selected six real datasets from SNAP Datasets [41]: Crime, HI-II-14, Digg, Enron, Gnutella31, and Facebook. These datasets cover a wide range of fields, including criminal networks, biological networks, social networks, and communication networks. Table 1 presents the details of the network structures of these datasets, including node MiniKey has the best performance when considering both ANC and SLR metrics simultaneously.

Results on Real-World Networks
During the testing phase on real-world graphs, we used the best model, trained on simulated BA graphs (with node range 30-50 and m = 4), as the default parameters for MiniKey. We selected six real datasets from SNAP Datasets [41]: Crime, HI-II-14, Digg, Enron, Gnutella31, and Facebook. These datasets cover a wide range of fields, including criminal networks, biological networks, social networks, and communication networks. Table 1 presents the details of the network structures of these datasets, including node number, edge number, maximum degree, average degree, diameter, clustering coefficient and assortativity. Furthermore, we compared MiniKey with network critical node identification algorithms that can run on large-scale networks, such as CI [16], MinSum [31], CoreHD [38], GND [23], and FINDER [30]. Table 2 and Figure 4, respectively, present the ANC score and the ANC curves of different methods on six real-world datasets. From the results, we can observe that FINDER and MiniKey consistently outperform the other methods across all datasets. Specifically, on Crime, HI-II-14, and Digg datasets, FINDER and MiniKey excel by providing the most accurate identification of key nodes. For the Gnutella31 and Facebook datasets, although BPD and MiniKey perform slightly better, FINDER still delivers results that are quite close to the top performers, indicating its effectiveness in key node identification.  Table 3 provides the solution length ratio (SLR), which measures the number of nodes each method uses to successfully decompose the network, a critical factor when evaluating efficiency. MiniKey exhibits an excellent performance in this respect, requiring the fewest nodes across all datasets to accomplish this task. It is particularly remarkable in the Crime and HI-II-14 datasets, significantly outperforming all other methods. FINDER also shows impressive results, especially on the Digg and Enron datasets, demonstrating high efficiency by maintaining a low SLR.
Additionally, as depicted in Figure 5, showing the pareto frontiers comparison of various methods across six real-world datasets, MiniKey evidently has a superior performance. In summary, both FINDER and MiniKey display an exceptional performance in critical node identification and the efficiency of network decomposition. MiniKey stands out due to its efficiency in maintaining a low SLR across all test datasets. This efficiency results in resource savings and an enhanced performance, marking it as a highly effective solution for network-dismantling problems.   Table 3 provides the solution length ratio (SLR), which measures the number of nodes each method uses to successfully decompose the network, a critical factor when evaluating efficiency. MiniKey exhibits an excellent performance in this respect, requiring the fewest nodes across all datasets to accomplish this task. It is particularly remarkable in the Crime and HI-II-14 datasets, significantly outperforming all other methods . FINDER also shows impressive results, especially on the Digg and Enron datasets, demonstrating high efficiency by maintaining a low SLR. Additionally, as depicted in Figure 5, showing the pareto frontiers comparison of various methods across six real-world datasets, MiniKey evidently has a superior performance. In summary, both FINDER and MiniKey display an exceptional performance in critical node identification and the efficiency of network decomposition. MiniKey stands out due to its efficiency in maintaining a low SLR across all test datasets. This efficiency results in resource savings and an enhanced performance, marking it as a highly effective solution for network-dismantling problems.

Discussion
The MiniKey algorithm proposed in this paper highlights a significant step forward in identifying key players in complex networks, emphasizing the use of minimal nodes. Evaluations across different datasets showcase its strength and flexibility. However, upon closer examination, clear challenges and details emerge, which are discussed as follows.
Scalability: while MiniKey exhibits notable proficiency with both small simulated graphs and extensive real-world datasets, care should be taken when navigating excep-

Discussion
The MiniKey algorithm proposed in this paper highlights a significant step forward in identifying key players in complex networks, emphasizing the use of minimal nodes. Evaluations across different datasets showcase its strength and flexibility. However, upon closer examination, clear challenges and details emerge, which are discussed as follows.
Scalability: while MiniKey exhibits notable proficiency with both small simulated graphs and extensive real-world datasets, care should be taken when navigating exceptionally large-scale networks. Networks characterized by complex structures or abundant node interactions might introduce challenges, potentially impacting the algorithm's reliability.
Dependency on Graph Representation: central to MiniKey's efficacy is its reliance on the graph representation technique, GraphSage. Nevertheless, as the landscape of graph neural networks (GNNs) continues to evolve, considering newer architectures might be beneficial. For instance, Graph Attention Network (GAT) [42], Message Passing Neural Networks (MPNN) [43], and Geometric Graph Convolutional Networks (Geom-GCN) [44] and et al. Incorporating or adapting components from these latest developments could potentially offer a more comprehensive embedding for Minikey, enhancing its capacity to tackle more intricate and diverse graph structures.
Structured Reward Shaping: our approach to reward-shaping is certainly innovative. However, it strongly relies on preset network functional markers. The absence of comprehensive markers or overlooking of pivotal network dynamics might steer the reinforcement learning agent away from the most fruitful actions.
Adaptability to Dynamic Networks: at present, MiniKey is fine-tuned to cater to static graphs. The ever-evolving landscape of dynamic networks, where node relationships fluctuate over time, presents a distinct challenge. Addressing this issue would require major changes to our current approach.
In conclusion, MiniKey presents a promising approach to optimum key players' identification with minimum nodes in complex networks; however, a comprehensive understanding and careful consideration of its limitations and intricacies are pivotal to harnessing its full potential and paving the way for future refinements in complex network analyses.

Conclusions
This paper presents MiniKey, an innovative approach to leveraging minimum nodes for optimum key players' identification in complex networks. Experiments conducted on a range of simulated and real-world datasets attest to the superior performance of MiniKey compared to other leading methods in terms of the Accumulated Normalized Connectivity (ANC) score and Solution Length Ratio (SLR). MiniKey outperforms existing strategies in its ability to identify and eliminate network edges, thereby proving its practical utility in various fields, including crime networks, social networks, communication networks, and bio-networks. Notably, MiniKey's unique strength lies in its efficiency, as it consistently uses fewer nodes to break network connectivity, providing resource savings and an enhanced performance. MiniKey reshapes our understanding of how to maintain or break network connectivity with minimal movements. Such insights can redefine current network analysis techniques, benefiting stakeholders in domains such as social media, urban planning, or epidemiology. Despite these promising results, the potential for future improvements and applications of MiniKey remains vast. Future research might focus on extending the approach to handling larger, more complex network structures or integrating it with other network analysis tools for more comprehensive network solutions. The adaptability and efficacy of MiniKey make it an exciting and promising frontier for network analysis.
Author Contributions: Conceptualization, methodology, formal analysis, investigation, resources, data curation, writing, L.Z. and C.F.; original draft preparation and writing, review and editing, L.Z., C.F. and C.C. All authors have read and agreed to the published version of the manuscript.