When Optimization Meets AI: An Intelligent Approach for Network Disintegration with Discrete Resource Allocation

: Network disintegration is a fundamental issue in the field of complex networks, with its core in identifying critical nodes or sets and removing them to weaken network functionality. The research on this problem has significant strategic value and has increasingly attracted attention, including in controlling the spread of diseases and dismantling terrorist organizations. In this paper, we focus on the problem of network disintegration with discrete entity resources from the attack view, that is, optimizing resource allocation to maximize the effect of network disintegration. Specifically, we model the network disintegration problem with limited entity resources as a nonlinear optimization problem and prove its NP-hardness. Then, we design a method based on deep reinforcement learning (DRL), Net-Cracker, which transforms the two-stage entity resource and network node selection task into a single-stage object selection problem. Extensive experiments demonstrate that compared with the benchmark algorithm, Net-Cracker can improve the solution quality by about 8 ∼ 62%, while enabling a 30-to-160-fold speed up. Net-Cracker also exhibits strong generalization ability and can find better results in a near real-time manner even when the network scale is much larger than that in training data.


Introduction
With the rapid development of network science, complex networks have been widely applied to describe the connections and interactions among complex systems in daily life [1,2], such as transportation networks, biological networks and terrorist networks.These networks mostly benefit human society, and many studies focus on how to defend and protect their robustness and integrity [3][4][5].However, there exists many harmful networks in modern society, such as disease spreading networks [6], terrorist networks [7], rumor spreading networks [8], etc.Such harmful networks should be destroyed and disintegrated to minimize their societal threat, which has attracted growing attention.
The essence of network disintegration lies in removing a set of the most crucial nodes of the network to achieve an optimal disintegration effect [9].At present, many studies are being conducted to determine crucial nodes and devise corresponding disintegration strategies from the view of analyzing the network structure, such as homogeneous single-layer network, homogeneous multi-layer network, heterogeneous network, and so on [10][11][12][13].However, in real-life applications, dismantling certain networks requires not only focusing on the structure of the network itself but also considers the discrete entity resources possessed by the attacker that can destroy the opponent's network.For example, when disrupting terrorist networks, it is necessary to take full account of the operational range and destructive capabilities of the weapon entities possessed by attackers [14].How the attackers allocate discrete weapons to strike members of terrorist networks will impact the efficacy of terrorist network disintegration.Similarly, in suppressing the spread of the COVID-19 virus network, it is essential to take into account the treatment capacity and scope of care provided by cabin hospital entities for infected individuals [15].The strategy of allocating appropriate cabin hospitals to treat the correlated infected individuals will have an impact on interrupting the spread of disease networks.The above entity resources, including weapons and cabin hospitals, are typically distributed discretely in different locations, exerting a decisive influence on the disintegration effect of the network.Thus, incorporating the entity resource possessed by the attacker into the problem of network disintegration holds great theoretical and realistic significance.
Figure 1 gives an illustrative scenario example of network disintegration with entity allocation.In Figure 1, there are thirteen nodes in the network, and node v 6 has the highest degree, which usually means it is a crucial node.The attacker has deployed four entities {w 1 , w 2 , w 3 , w 4 }, and each entity has its attack range represented by a light pink part.It is easy to find that the node v 6 cannot be destroyed by the attacker.The above example illustrates that network disintegration should consider not only the network structure itself but also the attack effect of entities.In fact, the removal difficulty of nodes in the network is not equal.For example, removing a hub node may require more entities than the other nodes.In addition, each entity can produce limited damage.We will give an example with some detailed values in Figure 2 to illustrate that the resource allocation strategy has important impacts on network disintegration.In this paper, we study the problem of optimizing resource allocation to maximize the effect of network disintegration.There are three main challenges in solving the above problem: (i) How can we accurately allocate the limited entities to maximize the disintegration effect?There are many factors in producing the final solution, including the heterogeneity of the removal difficulty of nodes, the attack range of entities, the attack ability values of different entities, and so on.Furthermore, as the entities can only be used in a discretized way, there must exist a resource fragmentation problem.(ii) How can we dynamically generate the disintegration strategy?For example, when destroying terrorist networks, weapon entities and terrorists will dynamically join or exit the war on demand, which makes the designed disintegration strategy variable.It is challenging to generate a disintegration strategy with adaptability.(iii) How can we quickly determine the resource allocation solution in large-scale cases?The network disintegration is an NP-hard problem [16].Traditional research on network disintegration mainly adopts approximation and metaheuristic algorithms, which often struggle to strike a satisfactory balance between solution speed and quality [17][18][19].The less computer running time it takes to search for a satisfactory solution from many alternative disintegration strategies, the faster it can make decisions and take action, and the greater the opportunity to eliminate terrorist networks and virus networks.To tackle the above challenges, we propose a deep reinforcement learning-based approach for the network disintegration problem.DRL has achieved a satisfactory balance between effectiveness and efficiency in solving combinatorial optimization problems over the past decade [20][21][22].To sum up, the main contributions of this paper include the following:

•
We model the network disintegration problem with discrete entity allocation using nonlinear optimization programming.By looking into the characteristics of limited entity resources and the heterogeneity of the removal difficulty of different nodes, we reveal that existing solutions cannot balance effectiveness and efficiency well.

•
We propose Net-Cracker, a deep reinforcement learning method to solve the network disintegration problem.This approach transforms the two-stage entity and network node selection task in the solution process into a new object selection form so as to simplify the solving process.

•
We conduct extensive experiments in multiple settings.The results demonstrate that our method has significant advantages regarding solution quality, computation time and scalability compared to the traditional method.
The rest of the paper is organized as follows.In Section 2, we present the related work on network disintegration and deep reinforcement learning.In Section 3, we give the problem of network disintegration and the mathematical model.Then, we describe the Net-Cracker model in detail in Section 4. Our evaluation method and experimental results are shown in Section 5.In Section 6, we discuss the proposed Net-Cracker method.Finally, this paper is concluded in Section 7.

Network Disintegration
As an active topic in the research of complex networks, network disintegration has garnered widespread attention from numerous scholars over the past decade.Different studies use different methods to identify the key node set for specific networks, though the results are mixed.
There are two categories of network disintegration research: One is network disintegration without resource limitations.In this case, many scholars have proposed disintegration strategies based on the special structural properties of nodes, including degree-based strategy [23], betweenness-based strategy [24], and clustering coefficient-based strategy [25], in which the declining order of node properties remove nodes.Furthermore, disintegration methods have been designed based on the structural characteristics of the network.Deng et al. [13] present a multiplex network disintegration strategy based on Tabu search, in which the disintegration effect is superior to typical disintegration strategies.Li et al. [11] put forward an operational capability disintegration method of combat networks under incomplete information.However, network disintegration without resource constraints is often ideal in the real world.For different scenarios, the concretization of resources required to disintegrate a network varies, such as the cost required to eliminate the virus network and the weapon entities required to destroy the terrorist network.Subsequently, another research introduces a network disintegration model with resource limitations [26][27][28], wherein the authors proposed a network disintegration model with limited cost, which assumes that removing different nodes requires different costs.These problems are essentially resource allocation problems [29], that is, how can we allocate cost resources to network nodes so that the disintegration effect is the best.
Nevertheless, a crucial issue is that although existing research incorporates cost resource constraints into network disintegration, this cost is a continuous resource.But empirically, the resources required for some network disintegration are concrete entities, tools or services that have discrete attributes.The current disintegration method cannot solve the network disintegration problem with discrete resources.Furthermore, most existing research focuses on the structural characteristics of networks to design a disintegration strategy, with little consideration of this situation of destroying harmful networks with limited entity resources from an attack view.

Deep Reinforcement Learning in Combinatorial Optimization Problems
Combinatorial optimization problems (COPs) are widely used in national defense, transportation, product manufacturing and other fields [30].Common COPs, such as the traveling salesman problem (TSP) [31], vehicle routing problem (VRP) [32], and minimum vertex cover problem (MVC) [33], aim to find the optimal solution from a set of finite objects and are NP-hard.The traditional method for solving COPs mainly adopts exact approaches [34] and approximate approaches [35].However, as the scale of practical problems and the need for real-time solutions increase, it is difficult for traditional methods to generate optimal COP solutions quickly.Moreover, traditional methods use iterative search and lack the ability to learn from historical data.As soon as the data for the same problem changes, it has to be searched and solved again, resulting in higher computational costs.
To tackle the above issue, the researchers proposed an end-to-end DRL method [36]: the trained deep neural network is used to directly output the solution to the problem, in which the parameters of the neural network are generally trained by a set of problem instances of the same type.
Hopfield et al. [37] first attempted to use neural networks to solve a COP and verified it on a small-scale TSP.However, for a newly given TSP, it has to be trained again from scratch, which has no advantage over the traditional algorithm.To effectively solve COPs, based on the sequence-to-sequence (Seq2Seq) model in the field of machine translation [38], Vinyals et al. [39] proposed a pointer network model for a TSP.It achieved satisfactory results, which triggered a wave of using deep neural networks to solve COPs.Subsequently, to alleviate the challenge of the difficulty in obtaining labels for supervised learning methods, Bello et al. [40] used the reinforcement learning method to train the pointer network model and introduced the critic network as a baseline to reduce the training variance.This approach is more scalable than traditional algorithms on the TSP and KnapSack problem.On the other hand, to solve COPs with graph structure, such as the MVC problem, Dai first proposed a graph neural network named structure2vec to solve COPs [41].In addition, Li used graph convolutional networks and guided tree search techniques for MVC and maximal independent set (MIS) problems [42].This approach effectively solves the situation where multiple optimal solutions exist.

Deep Reinforcement Learning Method in Network Disintegration
DRL has been applied to solve the network disintegration problem because it can effectively find the key node set of the network.Fan et al. proposed a graph neural network (GNN) and DRL to find an optimal set of nodes in networks, which outperformed existing methods in terms of solution quality [43].Chen et al. also combined DRL and GNN to search high-value edge attack sequences, and proved that the proposed method has strong applicability across various scenarios [44].Furthermore, various disintegration methods for different networks have also been proposed in current research.Zeng et al. used a combination of graph neural networks and reinforcement learning to address the disintegration problem of heterogeneous combat networks [45].A DRL algorithm is used to identify the set of key nodes in directed networks [46].Zeng et al. proposed a solution to the disintegration problem in multiplex networks based on the deep network representation learning model (MINER) [47].The wide application of the above method shows the potential of using the DRL algorithm to solve network disintegration problems.

Network Disintegration Model with Discrete Entity Resources
In this section, we present the mathematical model of network disintegration under the condition that the entity resources are used in a discrete way.For clarity, the main symbols involved in this article are illustrated in Table 1. the removal threshold of node v j d ij the distance between i th entity and j th node x ij whether i th entity attacks j th node u j the sum of damage value to j th node y j whether j th node can be removed Y disintegration strategy Ĝ the network after removing nodes

Problem Illustration
As shown in Figure 2, there is a complex network composed of thirteen nodes.The attacker has deployed four entities {w 1 , w 2 , w 3 , w 4 } around the network.Each entity has its attack range, which is illustrated by the light blue circle.It is easy to find that node v 1 can be attacked by w 1 , v 3 can be attacked by w 3 , and v 4 can be attacked by w 3 , w 4 .Additionally, each entity has different abilities to produce harm, and each node in the network has different removal thresholds.Without loss of generality, the attack abilities of {w 1 , w 2 , w 3 , w 4 } are {5, 8, 5, 2}, respectively.The removal thresholds of {v 1 , v 2 , v 3 , v 4 } are {5, 9, 5, 6}, respectively.
Currently, many researchers have used natural connectivity to measure the effect of network disintegration [27,48], which describes the number of closed loops with different path lengths for all nodes in the network.Let G represent the network topology, and A represent the adjacency matrix of G.The natural connectivity of G, i.e., Γ(G) can be calculated as follows: in which λ i is the i th largest eigenvalue of the adjacency matrix A(G).
With Equation ( 1), we can easily calculate the network connectivity after executing strategy 1 and strategy 2. As shown in Figure 2, the network connectivity in strategy 1 is 2.12, while it is 1.77 in strategy 2.
From the above example, we can formally define the network disintegration problem with discrete entity resources.Definition 1.Given a network, which can be represented by G = (V, E), and multiple entities W = {w 1 , w 2 , ..., w M }, each node in V has location information and removal threshold, and each entity in W has location information, attack range and attack ability value.Then, the network disintegration problem distributes the entities to proper nodes so that the network connectivity of G is minimized.

Problem Model
The complex network can be abstracted as an undirected graph, denoted as G = (V, E), where V = {v 1 , v 2 , • • • , v N } is a node set and E = {e 1 , e 2 , • • • , e T } is the interactions between nodes.Here, let N and T represent the number of nodes and edges, respectively.The adjacency matrix A(G) = (a ij ) N×N of G is defined as follows: if v i and v j are connected, then a ij = 1; otherwise, a ij = 0. Furthermore, the node v j can be described as v j := ⟨x j , y j , z j , k j ⟩, where (x j , y j , z j ) represents the location coordinates of the node v j , and k j represents the degree of node v j , in which the value of k j is equal to the number of adjacent edges of node v j .
We assume that the attacker has M entity resources, which can be denoted by a set W = {w 1 , w 2 , • • • , w M }.The state of the entity w i can be described by a tuple w i := ⟨x i , y i , z i , c i , r i ⟩.Specifically, (i) (x i , y i , z i ) represents the location coordinates of the entity w i , (ii) c i represents the attack ability of the entity, and (iii) r i indicates the attack range of the entity.
The damaging effect of entities on nodes depends on two aspects: the attack ability value of the entity and the removal threshold of the node.The higher the attack ability value of the entity, the stronger the damage to the node.In addition, the difficulty of removing nodes will escalate as the removal threshold of the node increases.Ren et al. [49] assumed that the cost of removing nodes is proportional to the degree of nodes.Similarly, we define the node removal threshold q j as the entity attack ability value required to remove the node v j and assume that q j is a function of the degree k j of node v j , as follows: where α j is a random disturbance value, indicating that external factors, such as electromagnetic interference, affect the node removal threshold.Obviously, only when the node is within the attack range of the entity can the entity be used to attack the node.Here, we use the Euclidean distance d ij to define the distance between the entity w i and the node v j , and we use an indicator function F(d ij ) to indicate whether w i can be used to attack v j , as shown in Equations ( 3) and (4): (3) We use the binary variable x ij to determine whether w i can attack v j .If w i attacks v j , then x ij = 1; otherwise x ij = 0. To ensure that each entity can only be assigned to attack one node, we have And to ensure that the entity can only be used to attack nodes within its attack range, we have The node can be removed only when the damage effects of the entities exceed its removal threshold.In this paper, we assume that the damage effect of multiple entities on the same node is a linear sum of their respective damage values.Let u j denote the sum of the damage value of the entity to the node v j , then We use binary variable y j to represent whether node v j can be removed.If node v j can be removed, then y j is 1; otherwise, y j is 0, which can be represented by Let V ⊆ V represent the removed node set, and the network after removing nodes as Ĝ = (V − V, Ê).Then, the disintegration strategy can be represented by where v j ∈ V if y j = 1.As introduced in Section 3.1, the effect of the disintegration strategy can be depicted by Φ(Y) = Γ( Ĝ).The objective of network disintegration is to find a set of disintegration strategies Y * that can minimize the network connectivity.
To sum up, the problem defined in Definition 1 can be represented as follows: The existing literature [50] has proven that the natural connectivity will strictly decrease once the nodes are removed.Therefore, a lower Φ value implies a more destructive disintegration strategy.

Complexity Analysis
To prove that the network disintegration problem characterized by Model ( 9) is NPhard, we first give the definition of a classic NPC problem, i.e., the subset sum problem [51].
Definition 2. Given a set of integers C = {c 1 , c 2 , ..., c m }, the subset sum problem is to decide whether there exists a subset A ⫋ C such that ∑ A = ∑ C 2 .
Theorem 1.The network disintegration problem formulated in Model ( 9) is NP-hard.
Proof of Theorem 1. Assuming that there is a subset sum instance, e.g., C = {c 1 , c 2 , ..., c m }, we could construct an instance of the network disintegration problem from this subset sum problem instance.As shown in Figure 3, there is a network with two crucial nodes {v 1 , v 2 }.
There are also m entities deployed near {v 1 , v 2 }, and their attack abilities are {c 1 , c 2 , ..., c m }, respectively.Additionally, we assume that the removal thresholds of both v 1 and v 2 are Without loss of generality, we assume that only v 1 and v 2 lie in the attack range of all entities.Obviously, if we could optimally solve the network disintegration problem in the constructed example, then we could solve the subset sum problem.If both v 1 and v 2 are removed, then the answer to the subset problem is "yes"; otherwise, "not".However, the subset problem is NPC, which shows that the network disintegration problem is also NP-hard.Thus, Theorem 1 is proved.

The Design of Net-Cracker
In this section, we propose an approach named Net-Cracker, which uses the deep reinforcement learning method to find the optimal solution for the network disintegration problem.Firstly, we introduce the framework overview of Net-Cracker, including several key processes.Then, we focus on an encoder-decoder neural network structure with an attention mechanism and explain how Net-Cracker effectively accomplishes the model training.

Framework Overview
DRL is an agent modeling method that combines the feature extraction capability of deep learning and the sequential decision-making capability of reinforcement learning.For a complex network disintegration problem, we can view it as a Markov Decision Process (MDP) [52].In each iteration, the agent chooses one object as an action based on the current state and subsequently updates the state.Therefore, solving the network disintegration problem with the DRL method is appropriate.Recent work has shown that AutoML in the field of machine learning lacks transparency and interpretability when dealing with high-risk medical issues [53], which prompted us to opt for an appropriate deep neural network to circumvent these issues.Specifically, in this paper, we primarily employ the actor-critic (AC) method in DRL [54,55].The AC method entails the actor's responsibility of generating policies to maximize cumulative returns.Meanwhile, the critic evaluates the policy generated by the actor and provides a value function to guide policy updates.The overall framework of Net-Cracker is shown in Figure 4, and its solving process is divided into three stages: combine, selection, and mapping.

Stage I: Combine
In the combination stage, we combine the current M entities and N nodes distributed on the battlefield through a Cartesian product, resulting in the generation of M × N objects.As shown in Figure 5, each object is the combination of one entity and one node, indicating one selected action candidate.Specifically, the object o k can be denoted as o k := ⟨w i , v j ⟩, which means that the entity i is used to attack the node j.

Stage II: Selection
In the selection stage, we use an encoder-decoder neural network with an attention mechanism to output the optimal solution.Specifically, we initially employ the encoder to extract features from all objects and generate their embeddings (high-dimensional vectors).Subsequently, these embeddings are fed into the decoding neural network for decoding purposes.The decoding process primarily involves selecting an optimal subset of objects from a pool of candidate objects.At each decoding step, we use the decoding neural network in combination with the attention computation method to calculate the attention value of all unselected objects.The agent then greedily selects the object with the highest attention value.Once an object is chosen, it is removed from the candidate object set.This aforementioned decoding process is iteratively repeated until either the maximum constraint on entity attack ability is reached or until the maximum number of steps is exhausted.The structure of the neural network will be illustrated in detail in Section 4.2, and the training process of the neural network parameters will be described in Section 4.3.

Stage III: Mapping
Based on the previous two stages, we can obtain the optimal object set.By mapping the objects into the entity-node pairs, we can derive a selection of entities and removed nodes, subsequently calculating the target value through the objective function Φ.
The design of state, action and reward function is crucial when using reinforcement learning.Specifically, we define the state S as a set of selected objects.At time t, if the agent has already selected t − 1 objects, then the state information can be expressed as s t = {o 1 , o 2 , ..., o t−1 }, where s t ∈ S. The action space is a set that includes all possible actions that an agent can perform in a specific state.In our problem, selecting an object is considered an action, and thus, the dimension of the action space is M × N. The reward for each solution is defined to be equivalent to its objective function, which measures network performance after nodes are removed.
In this way, we can obtain the solution of the network disintegration problem.During the training phase, we compute the reward value of the solution based on the designed reward function and subsequently employ it for backpropagation and the adjustment of network parameters.Once the loss value of these parameters stabilizes and meets our desired reward criteria, a well-trained network model is obtained.During the testing phase, we can utilize this trained network model to rapidly achieve high-quality disintegration outcomes by inputting entities and nodes in network information.

Detailed Design of the Neural Network Architecture
In this section, we mainly introduce the detailed design of the neural network architecture of an actor network in an AC framework that includes encoding, decoding and attention modules, aiming to address the following problems: (i) How can we efficiently extract the multidimensional static information of input objects during the encoding process?(ii) How can we output the probability distribution of selecting each object based on the current state and available action space during the decoding process?(iii) How can we handle the situation where the output objects form an ordered sequence and the length of the output sequence is different from the length of the input sequence?Figure 6 shows the detailed design of the neural network architecture.

The Encoder
The encoder is designed to map the state information of the input sequence, enabling the agent to comprehend the representation of each object.As the order of input object encoding does not affect the choice of subsequent actions, we adopt a one-dimensional convolutional neural network (CNN) as the encoding network to reduce model complexity.Each input object o i is encoded into an embedding vector e i , forming an encoding matrix E = {e 1 , e 2 , . . . ,e MN } with dimensions (M * N) × d h , where d h represents the dimensionality of the target vector.

The Decoder
Using the encoding vector generated by the encoder as input, the decoder sequentially decodes the current state into high-dimensional hidden states.Contrary to the encoding network, the decoding process needs to consider the information of the decoded sequence, so we use the recurrent neural network (RNN) with a memory storage function as the decoding network.In this way, we can obtain the hidden layer state d t by RNN, where d t contains the output sequence information {y 0 , y 1 , . . . ,y t−1 } of the decoder output before step t and serves as the query for the attention layer.

The Attention
As the output dimension is dynamically determined based on the input vector's dimension, it is necessary to adjust the dimensions of the output accordingly.The conventional Seq2Seq model lacks flexibility and cannot address this dynamic output dimension problem.Fortunately, Vinyals et al. [39] introduced the attention mechanism into the Seq2Seq model, yielding promising results.Therefore, we also incorporate the attention mechanism into our neural network architecture to tackle this problem.As depicted in Equation ( 10), at step t, we compute a weighted sum of the decoded hidden layer state d t and the embedding vector e j , which is subsequently passed through the tanh activation function.
where v, W a , and W b are trainable parameters.To enhance algorithm efficiency and reduce the action space, we propose incorporating a masking mechanism into the neural network output.For each candidate object, once an object is selected, it will no longer appear in the candidate object set.Specifically, a binary mask is employed to determine the validity of object o j at time t, as follows: Finally, we can derive the conditional probability distribution of selecting the next action, which can be represented as follows: During the training stage, we employ importance sampling to select the next action, ensuring that even actions with low probabilities are considered.However, during the test stage, we adopt the greedy policy to select the action with the highest probability.The detailed solution process of our proposed model is illustrated in Algorithm 1.

Training Procedure
The AC framework primarily comprises two networks: The actor network, which consists of an encoder, a decoder and an attention module, is designed to generate the probability distribution for action selection in the current state, as previously discussed.On the other hand, the critic network estimates the state value for a given problem instance and shares structural similarities with the encoder of the actor network.During the training process, our main goal is to obtain the optimal network parameters so as to output the best results in the test phase.
For the input problem instance s, the network parameters are set to θ, and the training objective of the network is defined as follows: where Φ(Y | s) denotes the objective value Φ of the solution sequence Y under a given instance s.
We use the policy gradient [56] to optimize the following parameters: where b(s) represents the baseline, which is utilized to estimate the expected value of the solution for a given instance and aids in reducing the variance of the gradient.
In practice, problem instances s 1 , s 2 , ..., s B ∼ S are sampled by the Monte Carlo method, and the average value of these samples is calculated to replace the expected value: To improve the learning efficiency of actor networks, it is common practice to introduce a parameterized baseline for estimating the expected objective value Therefore, we introduce a critic network parameterized by φ to learn the expected objective value found by the current policy p θ given the input instance s.The parameters of the critic network are trained through the mean-squared error between the sampled actual objective value and the estimated value b φ (s i ): where b φ (s i ) represents the output baseline of the critic.In our training procedure, the parameters of the critic and actor are updated sequentially simultaneously.The actor parameters θ are updated by the output of the critic, ensuring that the actor parameters are updated in a positive direction.The detailed training process is depicted in Algorithm 2.  Compute estimated value b φ (s i ) Update θ with dθ and adjust φ with dφ 16 end

Performance Evaluation
In this section, we first describe our experimental settings and comparative methods.Then, we present the results of the extensive computational experiment to evaluate the proposed Net-Cracker method.

Dataset
Since the constrained network disintegration problem formulated by Model ( 9) is being studied for the first time, there are no public datasets.Therefore, we generate the dataset independently as follows.
Two classical synthetic network structures, i.e., scale-free (SF) network and Erdős-Rényi (ER) random network, are selected to construct the architecture of the network.The ER network is a random network, where the connections between nodes follow a Poisson distribution, while the SF network refers to a network in which the node degrees follow a power-law distribution.We set a square area with size 2 × 2 to represent the battlefield and then randomly distribute the network nodes and entities on it.The attack ranges of the entities are sampled from a uniform distribution [0, 2], and the attack abilities of the entities are sampled from a uniform distribution [0, 10].The α j value is randomly generated in [0, 1], which is used to indicate the influence of other factors on the node removal threshold.
We have trained two different Net-Cracker agents and named them DRL-25 and DRL-40, respectively.The DRL-25 model was trained with 10 entities and 15 network nodes, while the DRL-40 model was trained with 15 entities and 25 network nodes.Each model instance was trained using 1 million data points.

Hyperparameter Setting
For each Net-Cracker agent, the encoder of the actor network embeds object information into a 128-dimensional vector by a 1D convolutional network with one layer, while the decoder is a GRU recurrent neural network with 128 hidden units, where Dropout is 0.1.The critic network consists of multiple 1D convolutional networks, where the output of the last layer is set to 1.The model uses the Adam optimizer for gradient optimization.The batch size is 128, and the learning rate is 10 −4 .

Benchmarks
The network disintegration problem defined in this paper has two decision spaces, i.e., selecting several nodes from the network and assigning appropriate entities to the selected nodes.However, classical disintegration strategies based on node centrality only decide which nodes should be destroyed, but never figure out the resource allocation solution.Therefore, in this paper, we use two classical heuristic algorithms, i.e., genetic algorithms and differential evolution algorithms as the benchmarks.

•
Genetic Algorithm (GA): Genetic algorithms search for optimal solutions by simulating the processes of natural selection, inheritance and evolution, and are characterized by simplicity, robustness and strong global search capabilities.By simulating genetic processes such as selection, crossover and mutation, it gradually evolves solutions that better adapt to the given problem.We first encode the entities and nodes in the network, and set the maximum number of iterations, and define the crossover and mutation operations of the GA.Next, we use the natural connectivity of the disintegrated network as the fitness function of the GA for individual evaluation.Finally, the optimal solution to the problem is output based on the fitness function.

• Differential Evolutionary Algorithm (DE):
The differential evolution algorithm is an intelligent optimization search algorithm that emerges through cooperation and competition among individuals within a population.It has strengths such as strong adaptability, few control parameters, simple settings and robust optimization results.The solution process of DE is the same as that of GA, but the setting of the mutation scale factor and crossover probability of DE is different from that of GA.
The main parameters of the GA and DE algorithms are depicted in Table 2, where Dim represents the dimension of the decision variables.We set the maximum number of iterations for the GA algorithm to be 200 and 500, denoted as GA-200 and GA-500, respectively.At the same time, the maximum number of iterations for the DE algorithm is set to 200 and 500, referred to as DE-200 and DE-500, respectively.

Performance Results
In this part, we will demonstrate the performance of Net-Cracker and the benchmarks on solving speed, solving time and generalization ability through the evaluation results.

Solving Quality
To systematically compare the solving quality of the solutions generated by Net-Cracker and other benchmarks, we conducted experiments with different settings on the number of synthesis network nodes from 40 to 160.Ten problem instances are randomly generated, and the average disintegration effect is calculated for each type of network at a given size.The disintegration effect of two synthetic networks is plotted in Figure 7.The smaller the objective value Φ, the better the solution quality of the algorithm.As shown in Figure 7, the quality of the solutions generated by the Net-Cracker is superior to those generated by the benchmark method in both the ER and SF networks.Specifically, for the ER network, regardless of the size of the problem instance, the average disintegration effect of the DRL-25 algorithm is slightly better than that of the other algorithms.Compared to the heuristic algorithms, the accuracy of the Net-Cracker can be increased by about 8∼10%.For the SF network, when the problem size is 40, 80 and 160, the disintegration performance of DRL-40 is significantly better than GA and DE.At the same time, when the problem size is 120, the solution quality of the DRL-25 algorithm is also superior to GA and DE.The accuracy of the Net-Cracker can be improved by about 50∼62%, indicating that the Net-Cracker is more aggressive in disintegrating the SF network.

Solving Speed
The solution time is crucial for the network disintegration problem.How to quickly generate a disintegration method based on entities and network data is of great significance for seizing the initiative of war.Since the Net-Cracker model used in this paper is endto-end, only the test time in the application phase is considered when compared with the benchmark algorithm, and the training time is not considered.
We draw the box plots of the solution times of different algorithms when faced with instances of disintegration problems in ER and SF networks, as presented in Figures 8 and 9, respectively.We can see that Net-Cracker can find the solution faster than other benchmark methods, regardless of the problem size of the synthetic network instance.In addition, to further analyze the correlation between the solving speed of the algorithm and the scale of network disintegration problems, we draw a heat map of the average solution time, as shown in Figure 10.As the problem size increases, we can see that the solution time of GA and DE algorithms significantly increases.In particular, GA-500 and DE-500, when dealing with the same problem instances, have significantly better solution quality than GA-200 and DE-200, respectively, but the solution time is more than twice that of GA-200

Discussion
The Net-Cracker is able to quickly find efficient solutions to address the challenge of low search efficiency faced by traditional methods in eliminating large-scale terrorist networks and blocking the spread of large-scale disease networks.Additionally, because of the powerful generalization ability of the Net-Cracker, even if there are changes in terrorist network members or variations in disease-infected networks, we do not need to spend a lot of time searching for strategies to destroy terrorist networks or block the spread of disease networks.Instead, we can find appropriate solutions quickly.However, there are still several limitations in this study.For example, the types of discrete entity resources used to address network problems are relatively limited.In addition, some complex networks in the real world have heterogeneous edges and nodes and how to reasonably model them is also an important challenge in the future.

Conclusions
The disintegration of networks with discrete entity allocations holds significant importance in many areas, for example, eliminating terrorist networks by allocating weapon entity resources, and eliminating disease transmission networks by allocating cabin hospital resources.Figuring out the optimal dissolution solution in many alternative strategies for network disintegration with discrete entity resource allocation is a challenging problem.In this paper, the effect of disintegration is evaluated by the natural connectivity of the network.To maximize this effect, we have designed a DRL-based method, Net-Cracker, which allocates the limited entity resources carefully to achieve the optimization goal.The customized design of Net-Cracker promises that it has good solution quality, solution speed and generalization ability.The results of extensive experiments illustrate that compared with the metaheuristic algorithm, Net-Cracker improves the solution quality by about 8∼62%, while enabling a 30-to-160-fold speed up.

Figure 1 .
Figure 1.A simple example to show network disintegration with limited entities.

Figure 2 .
Figure 2. Two different disintegration strategies.(a) Three entities attack three network nodes, only two of which can be removed.The network performance after disintegration is 2.12.(b) Three entities attack and remove two network nodes.The network performance after disintegration is 1.77.

Figure 3 .
Figure 3.A simple example to show the network disintegration problem is NP-hard.Only v 1 and v 2 lie in the attack range of all entities.

Figure 4 .
Figure 4.The framework of Net-Cracker.In the combination stage, the entity set and node set are combined to form a new object set through the Cartesian product.In the selection stage, we select the object based on the neural network until a complete solution is constructed.The parameters of the neural network can be trained by actor and critic networks.In the mapping phase, by mapping the objects into the entity-node pairs, we can calculate the natural connectivity of the destroyed network.

Figure 5 .
Figure 5. Combine entities and nodes into new objects.Each object has the attributes of entities and nodes.

Figure 6 .
Figure 6.The proposed neural network architecture in AC framework.The encoder extracts the features of the input object through the embedding layer.The decoder is used to store the decoded sequence information.The attention module uses the attention mechanism to output the probability distribution of the following input according to the embedded information and the hidden layer state of the decoding network.

Figure 7 .
Figure 7.The average disintegration effect in two synthetic networks by different algorithms.
and DE-200.In contrast, the DRL algorithm can find the solution within 2 seconds when facing different scale problems, and the solution time fluctuates less.

Figure 8 .
Figure 8.The solution time of various algorithms for the ER network disintegration problem at the same scale.

Table 1 .
Summary of notations.

Table 2 .
The main parameters of the GA and DE.

Table 3 .
Comparison of the generalization ability of the algorithm.