Efﬁcient Decision Approaches for Asset-Based Dynamic Weapon Target Assignment by a Receding Horizon and Marginal Return Heuristic

: The weapon-target assignment problem is a crucial decision support in a Command and Control system. As a typical operational scenario, the major asset-based dynamic weapon target assignment (A-DWTA) models and solving algorithms are challenging to reflect the actual requirement of decision maker. Deriving from the “shoot–look–shoot” principle, an “observe–orient–decide–act” loop model for A-DWTA (OODA/A-DWTA) is established. Focus on the decide phase of the OODA/A-DWTA loop, a novel A-DWTA model, which is based on the receding horizon decomposition strategy (A-DWTA/RH), is established. To solve the A-DWTA/RH efficiently, a heuristic algorithm based on statistical marginal return (HA-SMR) is designed, which proposes a reverse hierarchical idea of “asset value-target selected-weapon decision.” Experimental results show that HA-SMR solving A-DWTA/RH has advantages of real-time and robustness. The obtained decision plan can fulﬁll the operational mission in the fewer stages and the “radical-conservative” degree can be adjusted adaptively by parameters.


Introduction
Weapon target assignment (WTA), which is also known as Weapon Allocation or Weapon Assignment (WA), refers to the reactive assignment of defensive weapons to counter identified threats [1]. With the development of advanced weapons and combat theory, it is difficult for human decision-makers to counter the fire allocation problem effectively in the complex operational environment [2]. WTA is studied as a critical problem in an intelligent decision support system in order to reduce the decision pressure of human decision-makers or replace them.
The WTA problem was first introduced by Manne in 1959 [3]. In the following decades, various types of WTA problems have evolved. From the decision process, the WTA problem can be divided into static WTA (SWTA) [4] and dynamic WTA (DWTA) [5,6]. The difference between SWTA and DWTA is whether the time is considered as a dimension. SWTA launches weapons in a salvo to maximize operational effectiveness. DWTA assigns the sequence of weapons for the equilibrium plan during multi-stage [7]. According to the operational mission, there are mainly target-based WTA (T-WTA) [8], asset-based WTA (A-WTA) [9] and sensor-WTA (S-WTA) [10,11] model. The T-WTA model adopts the kill effectiveness of weapons against targets as the optimization objective. The purpose of the A-WTA problem is to maximize the survival values of own assets. S-WTA considers the collaborations between sensors and weapons. This paper focuses on the A-DWTA problem.
1. An OODA/A-DWTA loop model is established for supporting the following A-DWTA decision model and solving algorithm. 2. To reflect the actual operational requirement, an A-DWTA decision model based on the receding horizon strategy is presented. The "radical-conservative" degree of the obtained plan, which relates to the number of decision stages, can be adaptively adjusted by the model parameter. 3. A heuristic algorithm based on statistical marginal return is proposed to solve the A-DWTA model, which has the advantage of robust and real-time. The extensive experiments demonstrate the effectiveness of the proposed approaches.
The rest of this paper is organized as follows. Section 2 gives our motivations and formulates the OODA/A-DWTA loop model. Section 3 formulates the objective and constraints of the A-DWTA decision model. Section 4 presents the proposed HA-SMR for the A-DWTA model. Section 5 verifies the proposed HA-SMR solving the A-DWTA problem by experimental studies. The conclusion is finally summarized in Section 6.

Ooda/A-Dwta Loop Model
The OODA loop theory was put forward by John Boyd in the 1970s, and his theoretical viewpoint is that the execution process of a fire strike can be divided into the observe-orient-decide-act cycle. In the conflict, the party who completes the OODA loop faster has the advantage. Therefore, both sides' antagonistic goal is to accelerate their own OODA cycle, cut off, or delay the enemy OODA cycle process. After half a century of testing, the OODA loop theory has been widely used in military, economic warfare, competitive sports, and other fields.
The operational scenario of A-DWTA is: enemy units launch an attack to destroy the assets with military value but no defensive ability, such as military installations, personnel gathering placing, command and control nodes, weapons warehouses and ports. At this time, the defense unit with combat ability is the decision-maker and is to maximize the assets' survival value in multiple attack and defense stages by allocating weapon resources to kill the enemy targets, as shown in Figure 1.  The A-DWTA problem based on the "shoot-look-shoot" mechanism conforms to the typical OODA loop theory. In the observe phase, the decision-maker updates the external information (target state, asset state) of the current stage. In the orient phase, the state information collected in the observation phase is decomposed and reconstructed, such as the threat assessment based on the target track information, the identification of the attack intention of the targets against assets, the feasibility of a weapon attacking a target, and the determination of attack request. If the termination conditions are met, the weapon system is guided to exit the operational environment, otherwise continue to execute the OODA loop. In the decide phase, the specific combat requirements are converted into objective functions, and the solution is solved according to the designed weapon-target allocation algorithm. In the act phase, execute the weapon-target assignment plan given in the decide phase, update weapon state information, and evaluate the damaging effect of targets and the value of assets. A diagram of the OODA/A-DWTA loop model is shown in Figure 2. The notation employed in the context is listed in Table 1.

Observe (Look)
• Target    Notation Description m the number of available weapons at the initial stage; n the number of hostile targets at the initial stage; l the number of defense asset at the initial stage; s the index of decision stage, s = 1, 2, . . . , s m ; W(s) = [w i (s)] 1×m the weapon state of stage s; w i (s) = 1 denotes weapon i is available at the decide phase of stage s, otherwise w i (s) = 1; T(s) = [t j (s)] 1×n the target state of stage s; t j (s) = 1 denotes target j is threatening at the observe phase of stage s, otherwise t j (s) = 1; A(s) = [a k (s)] 1×l the asset state of stage s; a k (s) = 1 denotes asset k survives at the observe phase of stage s, otherwise a k (s) = 0; P(s) = [p ij (s)] m×n the weapon-target kill probability of stage s, p ij denotes the kill probability of weapon i against target j at the orient phase of stage s; C(s) = [c ij (s)] m×n the weapon=target attack condition of stage s; c ij = 1 indicates that weapon i satisfies the attack condition of intercepting target j at stage s, otherwise c ij = 0; Q(s) = [q jk (s)] n×l the target-asset intention matrix of stage s; q jk > 0 denotes that the attack intention of target j against asset k is assessed at the orient phase of stage s, and the destroy probability is q jk , otherwise q jk = 0; the weapon-target decision variable of stage s; d ij = 1 represents that weapon i is assigned to intercept target j at the decide phase of stage s, otherwise d ij = 0; the target survival probability after the act phase of stage s; the asset survival probability at the observe phase of stage s; U a (s) = [u a k (s)] 1×l the asset survival probability after the act phase of stage s.

Observe Phase
In the OODA/A-DWTA loop, the observation model describes the survival state of assets and the damage state of targets at the current stage under the uncertain battlefield environment. (1) Target state observation. Let the current stage be s, and the target observed state T(s) = [t j (s)] 1×n is the actual survival state of targets after the act phase of stage s − 1. Therefore, the observed target state at stage s should follow the Bernoulli distribution about the target survival probability at stage s − 1, namely t j (s)∼B 1, h j (s − 1) : where t j (s) = 0 indicates that the observed state of the target j in observation phase is damaged; conversely, t j (s) = 1 represents target j as a threat target in observation phase; h j (s − 1) is the target survival probability expectation after the end of stage s − 1, which is given in the damage assessment of the act phase.
Asset state observation. Similarly, the asset observed state A(s) of stage s follows the Bernoulli distribution of the asset survival probability expectation of stage s − 1. The asset survival probability expectation is related to the target survival probability expectation in the act phase of stage s − 1, and the target observed state can be obtained in the stage s. Therefore, in the observation model of OODA/A-DWTA, the asset survival probability expectation U o (s − 1) = [u o k (s − 1)] 1×l of stage s − 1 is modeled as the posterior conditional probability of the target observed state T(s) of stage s: , f or k = 1, 2, . . . , l (2) The asset observed state A(s) = [a k (s)] 1×l follows a k (s)∼B 1, u o k (s − 1) : where a k (s) = 0 represents that the observed state of asset k at stage s is survival; conversely, a k (s) = 1 represents that the observed state of asset k is destroyed.

Orient Phase
In the OODA/A-DWTA loop, the orient model is used to determine the attack request, and update the target-asset attack intention, weapon-target attack condition by the observation information. (1) Attack request determination. When any of the following conditions are met, the OODA/A-DWTA loop terminates, otherwise initiates an attack request and enters the decide phase: (a) all assets are destroyed, ||A(s)|| 1 = 0, where · 1 is the l1-norm; (b) all targets are damaged, ||T(s)|| 1 = 0; (c) no available weapon, ||W((s)|| 1 = 0. So the termination condition of OODA/A-DWTA is Target-asset attack intention. In actual combat, the identification of enemy target's attack intention requires the analysis of target altitude, distance, speed, acceleration, heading angle, azimuth, fire control radar state, maneuver type, and is predicted by domain knowledge. WTA research's focus lies in the decision model and algorithm of the command and control layer, not the methods of intention recognition. Hence we simplify the target-asset intention matrix Q = [q jk ] n×l to describe the enemy targets' attack state of each stage. q(s) jk = 0 represents that the attack intention of target j against asset k is not recognized at stage s. Otherwise, it represents that target j against asset k, and the destroy probability is q(s) jk .
Weapon-target attack condition. The condition of weapon-target attack is mainly determined by whether the time window of the fire control launch is satisfied. In DWTA, as the offensive and defensive stages recurse, the weapons that meet the attack conditions usually decrease. We introduce the lethality matrix P = [p ij ] m×n and feasibility matrix C = [c ij ] m×n to describe the weapon-target attack condition of each stage, where p ij (s) ∈ [0, 1) represents the destroy probability of weapon i against target j; c ij (s) = 1 indicates that weapon i meets the attack condition of intercepting target j, otherwise c ij (s) = 0.

Decide Phase
As the critical issue of the OODA/A-DWTA loop, the decide model aims to obtain the optimal weapon-target assignment under the current situation and maximize the survival value expectation of assets.
where F(·) is the objective function of the A-DWTA decision model, which directly reflects the operational requirement of C2 system. The solving algorithm of Formula (4) reflects the intelligent degree of weapon-target assignment. As the focus of this paper, the A-DWTA decision model will be studied in detail in the next section.

Act Phase
According to the obtained weapons-target assignment D(s) at the current stage, the act model needs to carry out weapon states update, target damage assessment, and asset survival value assessment. (1) Weapon state update. After acting the weapon allocation plan, the current stage's weapon state W(s) = [w i (s)] 1×m are determined by the previous stage's weapon states W(s − 1) and the current stage's decision plan D(s).
where w i (s) = 0 denotes weapon i is not available at stage s, otherwise w i (s) = 1.
Target damage assessment. The target survival probability is mainly determined by the target state, the weapon-target kill probability and the decision plan at current stage.
where (6), h j (s) ∈ [0, 1]. If and only if h j (s) ∈ [0, 1], that is, the target j is confirmed as damaged by the observe phase of the stage s, h j (s) = 0; if and only if t j (s) = 1 and ∑ m i=1 d ij = 0, that is, no weapon is not assigned to the target j with threat capability, h j (s) = 1.
where the term h j (s)q jk (s) represents the damage probability of target j attacking asset k in the act phase of stage s. The asset survival value expectation F(s) = [ f k (s)] 1×l is evaluated as If and only if a k (s) = 0, that is, the asset k is confirmed as destroyed by the observe phase of the stage s, e k (s) = 0; if and only if a k (s) = 1 and ∑ n j=1 q jk (s) = 0, that is, no target has attack intention to the target j, e k (s) = v k .
In summary, the model of OODA/A-DWTA loop model is

A-Dwta/Rh Formulation
In the A-DWTA decision model, it is assumed that there are s m stages of attack and defense decision-making before the enemy targets are annihilated, or assets are destroyed. In stage s, the problem variables are the destroy effectiveness Q of enemy targets against assets obtained by threat assessment; the survival value V of assets; the defensive effectiveness P of weapons against targets and the attack constraints C of weapon-target. The output variable is the defense decision matrix D. The research goal of the A-DWTA problem is to solve the decision model through algorithms under the current situation {V, Q, P, A, T, W, C, D, s}, obtain an optimal firepower decision plan D * , and maximize the effectiveness of reducing enemy targets' threat to assets.
where the design of the objective function F directly reflects the operational purpose of the command and control system, and the constraints C reflect the battlefield environment. The solving algorithm directly reflects the reasonable degree of firepower decision.
To solve the problem that the standard A-DWTA model pursues the theoretical optimal solution in mathematics and is not tightly integrated with the operational requirements, this section establishes a two-stage A-DWTA model based on the Receding Horizon Strategy (A-DWTA/RH).

Objective Building
The major objective function of A-DWTA model is derived from the classic static WTA model, and is formulated as the sum of the asset value expectation at each stage, such as where s m is the maximum number of decision stages; decision set X t = {X t , X t+1 , . . . , X s m } is the global decision sequence starting from the current stage t, where X s = [x ij (s)] m×n is the local decision of stage s. For solving the problem (11), the related algorithms are to set the maximum stages s m , predict the targets and assets status of all subsequent stages at the current stage, search the current global optimal solution for stage one to s m . In response to this problem, we believe that the maximum number of stages is one of the most crucial algorithm metrics in the A-DWTA problem. The primary task of decision-makers is still to kill the targets efficiently and protect assets from damage cost-effectively in the fewest stages. Secondly, considering the uncertainty of weapons killing targets, it is still necessary to ensure the remaining weapons' ability to counter survival targets in subsequent stages. Following the above idea, we propose a two-stage A-DWTA model based on the receding horizon strategy. The design of the objective function follows two points: (1) from the perspective of the decision-maker, it should first ensure the efficient and cost-effective killing of targets at the current stage to achieve the purpose of protecting our assets. (2) Assuming the targets have not been killed after the act of the current stage, the remaining weapons should have sufficient countermeasures against the enemy target. In summary, the objective function of the A-DWTA/RH model is composed of two parts. The first part is the direct income expectation of weapons killing targets in the current stage, and the second part is the prediction return of the remaining weapons against the undamaged target in the next stage, as shown max F (V, Q, P, A, T, W, C, D, s) = λ f 1 (V, Q, P, A, D, s) where f 1 represents the absolute return expectation of the decision variable b at the current stage;

Absolute Return Expectation at the Current Stage
The absolute return expectation at the current stage can be expressed by the combat effectiveness of the used weapons at the current stage: the survival value expectation of assets under the weapon-target assignment scheme. Let the current stage be the stage s and the decision variable is D(s), then the normalized combat effectiveness of the current stage is Substitute the Formula (6) into the above formula to get

Return Evaluation of Remaining Weapons on Prediction Situation
Since the current stage cannot obtain the situation information such as the weapon/target/asset status and target attack intention of the next stage, it is necessary to design the return evaluation method of combat mission under the prediction situation. The designed evaluation method is: due to the event "weapons attack the target," the threat of the target to the asset will be attenuated, and the value expectation of the asset will rise. Therefore, according to the state distribution of the situation information in the next stage, the value expectation of assets under the situation of "threatening target-surviving asset" is calculated first. Then the value expectation under the situation of "surplus weapon-threatening target-surviving asset" is calculated. The difference between the two value expectation can be used as the return expectation of remaining weapons in the prediction situation.
Based on the above ideas, after the weapon-target assignment plan D(s) is implemented at the stage s, the OODA/A-DWTA loop established in Section 2 gives: the weapon state is W(s + 1) = W(s)\D(s), the target state is subject to t j (s + 1)∼B 1, h j (s) , and the asset state is subject to (6). The return expectation of each "remaining weapon-threat target-survival asset" subject to the above distribution can be used as the return of the remaining weapon in the prediction situation.
First, regardless of weapons W(s + 1), the prediction threat of targets T(s + 1) to assets A(s + 1) can be evaluated as: From the above formula, considering the damage of weapons W(s + 1) on targets T(s + 1), the prediction threat to assets A(s + 1) can be evaluated as Normalize the difference value between Equations (15) and (16), and the prediction return of remaining weapons in the next stage is presented as

Constraints
In the A-DWTA model, the necessary mathematical constraints include 0-1 integer constraints of decision variables, and total constraints of weapons, as shown below: Besides, model constraints can be designed according to specific DWTA models. Commonly used model constraints include multi-target attack constraints and weapon consumption constraints on single-target.
where the first term is the multi-target attack constraint, limiting the number of targets that one weapon can attack at one stage. In the coding of WTA models, the weapon platform is usually disassembled into single-weapon, that is, m i = 1. The second item is the weapon consumption constraint on single-target, which is the upper limit of the number of weapons allocated to one target at one stage. This paper considers that the motivation of constraint (19) is to control the consumption of weapons at each stage, prevent excessive consumption of weapons and the DWTA model degenerating into the SWTA model. In the auxiliary decision-making system, this constraint is not the operational index that the decision-maker concentrates on. Furthermore, the artificial setting of n i itself adds an extra burden of decision-making and cannot directly reflect the improvement of combat effectiveness.
To quantify the cost-effectiveness ratio of the decision plan more accurately and reduce the burden on decision-makers, we convert the constraints (19) into the gradient limit of target survival value based on the characteristics of the weapon consumption characteristic curve. The attenuation of firepower return is employed as constraint metric arg min where ρ is the threshold parameter of weapon consumption. In summary, the proposed A-DWTA/RH model and constraints are

Ha-Smr Algorithm
This section analyzes the limitations of the heuristic information commonly used in the current WTA algorithms, proposes heuristic information based on statistical marginal return, and designs an A-DWTA solving algorithm based on this heuristic information.

Algorithm Framework
Disassembling the offensive and defensive relationship of the A-DWTA/RH decision model shows that there are two types of decision plans: (1) target-asset attack plan; (2) weapon-target assignment plan. It can be seen that the return flow of the objective function of the A-DWTA/RH model is "weapon-target-asset." Therefore, the algorithm proposed in this paper is to perform the reverse hierarchical way of "asset value-target selected-weapon decision." First, based on the perspective of asset defense, calculate the value loss expectation of each asset according to the current weapon-target plan, target-asset intention, and target-asset destroy probability, then select the resource with the maximal value loss expectation as the priority defense asset. Secondly, calculate the value loss expectation caused by each target attacking the priority defense asset, and select the target corresponding to the maximum value as the priority attack target. Finally, the priority weapon is solved for by heuristic information. The decision plan is updated, and the weapon is removed from the set of available weapons. Return to update the state of the decision-making system and repeat the above operations until the algorithm meets the termination condition. In the above iterative process, the diversity of the decision set and the algorithm's real-time performance are maintained by the objective function.
The main loop of MOEA-CMWTA is outlined in Algorithm 1. The flow diagram of HA-SMR is shown in Figure 3.

Priority of Target and Asset
To facilitate the understanding of the algorithm proposed in this chapter, first define the key variable between the "weapon-target" and "target-asset" situations in the A-DWTA model: the equivalent conditional kill probability.

Definition 1 (Equivalent Conditional Kill Probability).
In the A-DWTA model, considering the determined weapon-target assignment plan, the expected kill probability of targets against assets under the "weapon-target-asset" situation is defined as equivalent conditional kill probability.
First, let the current weapon-target decision matrix be D, the target-asset destroy matrix be Q, and the equivalent kill condition probability matrix of weapon-target-asset be Second, calculate the target-asset loss value expectation matrix L = [g jk ] n×l based on the target-asset equivalent condition damage probability Q (t) and the asset survival value V where I l = [1] l×1 . At this time, the value of the element g j k represents the loss value expectation of the two-tuple "target j-asset k" under the current decision matrix D. According to the maximum marginal return strategy, the maximum element in the loss value expectation matrix D is extracted to determine the priority defense asset A, and then the priority attack target B is selected, namely where e ij represents the marginal return expectation of the value of asset k a generated by weapon i attacking target j, and asset k a is the attack intention of target j; q hk denotes the equivalent kill condition probability of target h against asset k.
Proof of Theorem 1. Under the current decision D and situation {V, P, Q, C, A, T, W}, the target-asset equivalent conditional killing probability is Q , and the value expectation of each asset is Take target j t with attack intention to asset k a , i.e., q j t k a > 0, for example. If weapon i w is available and assign i w − j t , then the equivalent conditional killing probability of target j t to asset k a under Then the value expectation of asset k a is Difference Formulas (28) and (29) shows that the transfer function from weapon i w against target j t to the marginal return expectation of asset a k is Substitute the equivalent conditional kill probability d of Formula (23) into the above formula. Without loss of generality, the weapon-target marginal return expectation matrix E = [e ij ] m×n is According to the above proof, it can be seen that the matrix E has been constrained, and e ij = 0 when the following three situations occur: (1) weapon i is unavailable; (2) target j has been destroyed; (3) weapon i is available, and target j is not damaged, but the i − j pair does not meet the attackable condition, c ij = 0.
The information flow diagram of the calculation of the weapon-target marginal return expectation is shown in Figure 4.

Heuristic Information Design
Under the reverse hierarchical solution strategy proposed above, the A-DWTA model has been transformed into a weapon selection problem similar to the SWTA model. In the WTA problem, the heuristic information of weapon selection is usually based on the construction of combat efficiency maximized from the current perspective [10,29,36]. First, calculate the comprehensive kill efficiency matrix G = [g ij ] m×n based on the target value V and the kill probability matrix G = [g ij ] m×n . g ij = v j p ij , f or i = 1, 2, . . . , m; j = 1, 2, . . . , n.
According to different decision-making perspectives, the heuristic information based on killing effectiveness can be roughly divided into weapon-priority heuristic information and target-priority heuristic information. Weapon/target-priority heuristic information is widely used in large-scale WTA algorithms, which can improve the real-time performance and quickly obtain feasible solutions. However, the following analysis shows that the heuristic information based on the local perspective can easily make the search algorithm fall into the local optimal solution. Taking the 2*2 scale as an example, in Figure 5a, according to the decision process based on weapon-priority heuristic information, the return expectation of weapon 1 against target 1 is 0.8 × 0.6 = 0.48, and the return expectation of weapon 1 against target 2 is 0.6 × 0.4 = 0.24. The principle of maximizing return will assign weapon 1 to target 1, and the value expectation of target 1 is updated to 0.6 × (1 − 0.8) = 0.12. Continue to execute the allocation of weapon 2. Return the expectation of weapon 2 attacking target 1 is 0.2 × 0.12 = 0.024, and the return expectation of weapon 2 attacking target 2 is 0.4 × 0.4 = 0.16. Therefore, the allocation scheme of (w 1 − t 1 , w 2 − t 2 ) will be generated. Similarly, the same local optimal scheme can be obtained by using the target-priority heuristic information. However, the optimal scheme can be known (w 1 − t 2 , w 2 − t 1 ) through enumeration. In Figure 5b, the global optimal allocation scheme can be obtained by using the weapon/target-priority heuristic information.  In Figure 5a, the weapon/target priority heuristic information makes the decision plan fall into local optimal, while in Figure 5b, the weapons/target priority heuristic information leads to the global optimal plan. By analyzing the internal reasons and mechanisms, the following inference is presented: heuristic information based on statistical marginal return can better overcome local optimal than heuristic information that maximizes marginal return from a single perspective.
The main reason why the maximum marginal return heuristic information easily falls into a locally optimal solution is that, in the case of certain coupling of weapon-target killing efficiency, the weapon-target assignment from the maximum marginal return is terrible under the global perspective. For example, if the killing effect of weapon w i against other targets is much smaller than that of the target t j , the maximum marginal benefit heuristic information will execute the decision w i − t j , so that the threat weight of target j will decay rapidly. Then, the marginal benefit of the remaining weapon W\{w i } attacking target t j is minimal and is usually allocated to the other targets T\{t j }. If weapon w i has a better killing effect on the other targets T\{t j }, and the remaining weapons W\{w i } have a better killing effect on target t j , the decision has fallen into a terrible local optimal solution.
In order to overcome the above coupling mechanism, this paper constructs heuristic information based on statistical marginal return: if a weapon has high killing efficiency against a few targets and low killing efficiency against the other targets, it is considered to have high execution priority and should give priority to the target. Otherwise, the target that the weapon can kill effectively may enter the attack position of other weapons, so that the marginal return of the target being pursued will be greatly reduced, and the weapon will be forced to be allocated to other targets with a lower return. When a weapon attacks each target with a small difference in its killing efficiency, it can be considered more applicable, that is, it has the characteristics of "jack of all trades" and low execution priority, and is assigned after the higher priority weapon is assigned. Analyzing Figure 5 shows that if the above ideas are used, the local optimal can be effectively avoided, and the theoretical optimal solution can be obtained.
The operation process of weapon selection based on statistical marginal return is as follows: assuming that the priority defense asset t p and the priority attack target t p are generated by Equation (25), and the marginal return matrix E = [e ij ] m×n of "weapon-target-priority asset" is generated by Equation (31). The heuristic information G = {g 1 , g 2 , . . . , g m } of "weapon set-priority attack target-priority defense asset" is evaluated by E. A weapon is selected to target t p by G.
The statistical marginal return heuristic information should be constructed according to the following principles: (1) It can reflect the absolute marginal return of priority assets under the allocation of "weapon-priority attack target"; (2) It can reflect the relative return of the asset set when the weapon abandons the priority target and attacks another targets.
Taking i = 1, 2, . . . , m as an example, heuristic information g i can be disassembled into information pair (g 1 i , g 2 i ) according to the above principles. g 1 i represents the absolute marginal return of the priority asset a k when the weapon w i is allocated to the priority target t j , namely e it p ; g 2 i represents the relative return of asset set A when weapon w i attacks the priority target t p compared with another target. The relative return should first reflect the absolute priority of "w i − t p " in the "w i − target set T" sequence, and secondly, reflect the dispersion degree of the distance between "w i − t p " and "w i − low-return target." Therefore, from the perspective of weapon w i , we define the target set whose killing efficiency is higher than w i − t j as the high-return target set T p = {j|e ij > e it p }, and the target set lower than w i − t j as the low-return target set T s = {j|0 < e ij < e it p }, where the available weapon condition is not satisfied, that is, the weapon of e ij = 0 do not enter this set division. So the relative return is designed as where the first term represents the absolute priority of i − t p in the i − T sequence; the second term represents the dispersion degree of i − t p from the i − T s sequence, reflecting the relative priority of 1), the heuristic information e 2 firstly depends on the absolute priority, and then depends on the relative priority if the absolute priority is the same. The heuristic information pair constructed in this paper is According to Equation (34), the information set G = {g 1 , g 2 , . . . , g m } composed of each weapon heuristic information pair g i (E, t p ) = (g 1 i , g 2 i ) can be obtained, which is used as an indicator of each weapon's execution priority.
In summary, the A-DWTA decision method proposed in this paper is as follows: The current solution set of solution individuals is D r = {D 1 , D 2 , · · · , D u m }, where u m is the upper limit of the solution set, and r is the current weapon consumption. Extract a single solution D i , solve the priority defense asset a p and priority attack target t p , calculate the statistical marginal return information set G = {(g 11 , g 12 ), (g 21 , g 22 ), · · · , (g d1 , g d2 )} of each weapon under "a p − t p ," select the non-dominated information (g i1 , g i2 ) from G to form a new information set G . Set the maximum number of differentiation for a single solution as d. If the number of non-dominated information pairs is greater than the differentiation number, that is, |G | > d, d information pairs corresponding to d usable weapons are selected in descending order of g 1 , and D i is differentiated into K feasible solutions; otherwise, D i is directly differentiated into |G | feasible solutions. Move the differentiated solutions to the differentiation set D . Repeat the above process until all individuals in D complete the differentiation operation. Calculate each solution's fitness in D, and select u m individuals with optimal fitness as the solution set D t+1 for the next iteration. Repeat the above process until the algorithm termination condition is met. The flowchart of HA-SMR is shown in Figure 6.  In HA-SMR, the designed differentiation strategy also reflects the diversity of the solution set, which can effectively avoid local optimal solutions. The reason for the generation of differentiated individuals in descending order g 1 is that considering the combat objectives of asset defense, the absolute return of killing targets at the current stage should take priority over the relative return.

Constraint Handling
In WTA problems, model constraints are usually used as algorithm termination conditions. In the model constraints (22), the constraint of marginal return expectation of a weapon is proposed. This constraint not only optimizes the decision efficiency-cost ratio but also controls the algorithm flow adaptively. As in step 7 of Algorithm 1, the weapon-target feasibility C(s) is updated as where asset k a is the attack intention of target j as in Formula (31); ρ is the weapon consumption threshold of Formula (20). Formula (35) shows that the assignment w i − t j will reduce the cost-effectiveness ratio, and w i is allocated to other unmet threshold targets or retained for the next stage.
In the constraint conditions (22) of the A-DWTA model, the first three mathematical constraints have been satisfied by the coding method, and the fourth model constraint is also satisfied by the above termination condition. The dynamic constraints that need to be dealt with in the solving process are weapon-target feasibility constraints and target-resource feasibility constraints. According to the information flow of the algorithm, the observable variables connecting weapon, target, and resource are the weapon-target kill probability matrix P, the weapon-target attack feasibility matrix C, and the target-resource damage matrix Q. The algorithm employs the {W, T, A, C} to update P and Q, that is, set the component corresponding to the infeasible variable to 0, and there is no assignment return. For example, if p ij = 0 indicates the following constraints: (1) weapon i is not available; (2) target j has been killed; (3) weapon i is available, and target j survives, but the attack condition of w i − t j is not satisfied, namely c ij = 0. Similarly, if q jk = 0 indicates: (1) target j has been killed; (2) asset k has been destroyed; (3) both target j and asset k are in a state of survival, but the attack intention of target j is not asset k. Through the above processing, the algorithm has been de-constrained, without additional constraint processing.

Experimental Studies
This section verifies the superiority of the A-DWTA/RH model and the HA-SMR proposed in this paper by setting experimental cases and comparison algorithms.

Operational Scenario and Performance Metrics
In the experiment section, the target-asset destroy intention, weapon-target attack condition is initialized and evolved by the following methods. (1) Target-asset destroy intention. Assuming that the defender is unknown to the targets' attack strategy, and the threat assessment system identifies the target intention. The initialization method of target-asset destroy matrix is: (i) when targets are no fewer than assets (n ≤ l), the target intension is generated randomly on the premise that each asset is assigned at least one target. (ii) When there are more targets than assets (n > l), the target intension is generated randomly under the premise that each target aims to different assets. The target-asset destroy probability is initialized as q jk (s) = q l + (q h − q l ) · rand, f or j = 1, 2, · · · , m; k = 1, 2, · · · , n; s = 1, 2, · · · , s m , where q h and q l reflect the upper and lower bounds of target-asset vulnerability.
(2) Weapon-target attack condition. The weapon-target kill probability is initialized by p ij (1) = p l + (p h − p l ) · rand, f or i = 1, 2, · · · , m; j = 1, 2, · · · , n, where p h and p l denotes the upper and lower bounds of weapon-target kill probability. In the evolution of subsequent stages, due to the operational state's continuity, the weapon-target kill matrix of the current stage should be related to the weapon-target kill matrix, weapon state and target state of the previous stage. Let the transfer state of weapon-target kill probability follow Weapon-target feasibility matrix C of each stage refer to [37] c jk (s) = 1 2 sign rand − c l + (c u −c l )(s−1) f or s = 1, 2, . . . , s m ; j = 1, 2, . . . , n; k = 1, 2, . . . , l where c ij (s) = 1 means that the launch condition of i − j pair is met at stage s, otherwise c ij (s) = 1; ratio(s) is the probability threshold that weapon i can not launch for target j at stage s. The threshold decreases with the increase of the stage s, and the generated probability of zero elements in c increases, which meet the evolution trend of attack uncertainty in the DWTA problem. c h and c l are the upper and lower bounds of the function ratio(s), and s is set a constant not less than the maximum stage number s m . sign(·) is equal to 1 if its argument is positive and −1 otherwise. From the application background of A-DWTA, the normalized asset value (NAV) is proposed as the dynamic metric. The weapon consumption and the number of stages are taken as the mission completion metric. Algorithm complexity and compute time are used as the real-time metric. The above metrics cover the operational requirements of decision-makers.

Experiments on Comparison Algorithms
The research on the A-DWTA problem is less than the research on the other WTA problems, and the models and solving algorithms proposed by each study are different, so it is not easy to make comparison under the same experimental framework. We select research with a similar model framework as the comparison group and adopt the verified algorithms as comparison algorithms: Hybrid Genetic Algorithm (HGA); Hybrid Ant-Colony Optimization (HACO); Memetic Algorithm based on Greedy Local Search (MA-GLS); Rule-based Heuristic Algorithm (RHA). Except, RHA has no parameter setting, the parameters of the model and comparison algorithms are as follows: To complete the defense mission, the number of weapons should be no less than the number of targets. In order to compare the algorithms in different problem scales, the following method is used to generate experimental cases: randomly generate the number of targets n in different scale intervals [n l , n u ]; randomly generate the number of weapons m in the interval [1.5n, 2n]; and randomly generate the number of assets l in the interval [0.5n, 1.5n], as shown in the formula. In the interval [1,100], the target number n is randomly generated with an interval of 10, and nine experimental cases generated are shown in Table 2. Let each comparison algorithm independently solve the above cases 30 times. The statistics of the experimental results are shown in Table 3, and the distribution of NAV and computational time is illustrated as box plots in Figure 7. A box plot is used to illustrate the distribution of metrics. The notches represent a robust estimate of the uncertainty about the medians for box-to-box comparison, and the symbol + denotes outliers.
Analyzing Table 3 and Figure 7, the mean value of NAV of HA-SMR is only 0.002∼0.005 lower than HGA and HACO in case 1 and 2, slightly lower than HACO in case 4-7, and better than other algorithms in case 8 and 9. The reason is that the SMR algorithm has no apparent advantages in convergence at a relatively small scale. However, owing to the problem scale and population size, the convergences of HGA, HACO, and MA-GLS are worse than HA-SMR. RHA can be considered as a particular case of HA-SMR in a single search direction, so the NAV of RHA is lower than HA-SMR. HA-SMR and RHA are deterministic algorithms, and the mean square error of NSVA is 0 theoretically. In the other swarming intelligent algorithms, the mean square errors of NAV are between 0.001 and 0.005 and increase with the increase of problem size. In terms of real-time performance, HA-SMR outperforms than the other algorithms, followed by RHA. The computational time of HA-SMR and RHA is in the same order of magnitude, while the other algorithms' compute time is significantly higher than one order of magnitude. Similarly, the theoretical mean square error of two deterministic algorithms' compute time is 0, and the mean square error of the other algorithms' computational time increases with the problem scale, and is between 0 and 5 s. In conclusion, within the scale of W150T100A100, HA-SMR has better robustness in convergence, and balances the optimal and real-time effectively.
Besides NAV and computational time, minimizing the number of stages is also an essential operational requirement and a motivation of this paper. To verify the design of the SMR algorithm for "interception as soon as possible" combat requirements, Figure 8 shows the distribution of the actual number of stages obtained by each algorithm solving the same case 30 times independently.
According to Figure 8, the number of stages of the decision plan obtained by HA-SMR solving the A-DWTA/RH model is less than other algorithms. When the problem scale is small (case 1 to 4), the number of stages obtained by HA-SMR and RHA is two or three. Let the number of decision stages of the HA-SMR algorithm be s m , then the number of stages obtained by HGA, HACO, and MA-GLS is distributed in the interval [s min , 4]. The proportion of four stages solved by HACO is significantly less than that of HGA and MA-GLS. In conclusion, the scheme obtained by the HA-SMR algorithm can complete the target killing task in as few stages as possible. To further illustrate the advantages of HA-SMR through the weapon-target-asset states at each stage, Figure 9 shows the mean values of NMVA, remaining weapons, and surviving targets from the observe phase of each stage.    In Figure 9, HA-SMR and RHA consume more weapons at the 1st stage than the other algorithms, making the observed number of threat targets and remaining weapons drop rapidly at the 2nd stage. However, due to RHA's greedy strategy, if the targets survive after the 1st stage, the remaining weapons in the 2nd or 3rd are less effective in killing the target. The statistical return strategy of HA-SMR maintains the remaining weapons still kill the surviving targets effectively. Due to the full-stage prediction mechanism, the other algorithms' decision strategy is more conservative than HA-SMR, and the remaining weapons are less than HA-SMR when the operational mission is completed. It can be seen that the HA-SMR solving A-DWTA/RH has a better efficiency-cost ratio.

Parameter Sensitivity
In this section, the effects of specific parameters in the A-DWTA/RH model and HA-SMR on operational missions are analyzed experimentally.

Model Weight λ
In the A-DWTA/RH model, parameters are used to weigh the weapon consumption's killing efficiency at the current stage against the counter-effectiveness of the remaining weapon. To analyze the influence on operational mission, λ = [0.1 : 0.1 : 0.9] is set up for simulation experiments. The problem scale of the example was W100T50A50, and the other variables are generated as the previous experiment. Under the different values of λ, the following three operational mission termination conditions are set: 1. All assets are destroyed (l(s m ) = 0), defense mission fails, let Γ = 0. 2. Assets are not all damaged and targets are all killed (l(s m ) > 0 and n(s m ) = 0), the defense task is completed, let Γ = 1. 3. Assets have not been destroyed and targets have not been killed, but weapons have been consumed (l(s m ) > 0, n(s m ) > 0 and m(s m ) = 0). It is predicted that the defense mission will fail, and let Γ = −1.
The experimental results of each case under the above parameter settings are shown in Table 4. To further analyze the dynamic performance of HA-SMR solving the A-DWTA/RH under different λ value, Figure 10 gives the plot of the number of surviving assets, the number of surviving assets targets, the number of remaining weapons, NAV, and plan fitness observed at each stage. In Table 4 and Figure 10, with the parameter increases from 0.1 to 0.9, the number of decision stages decreases from 4 to 2, and the compute time decreases from 3.3302s to 1.2706s. The reason is that A-DWTA/RH's objective F is composed of the current decision's kill effect f 1 and the remaining weapons' counter effect f 2 after execution. λ represents the weight of the former. When λ is small, the decision plan will be guided to limit the weapon consumption of the current stage and retain the countermeasures in subsequent stages, which leads to an increased number of stages and computational time. The maximum value of NAV is 0.9832 at λ = 0.7, and the fitness metric increases with the increase of λ. It can be seen that the objective F of A-DWTA/RH is more sensitive to f 1 than f 2 . Even when λ = 0.9, there is no situation of excessive weapon consumption at the current stage and no counter weapons at the later stage (Γ = −1). In the mission termination sate, the defender can complete the mission (Γ = 1) under the different λ value. However, as the λ value increases, the proportion of the current stage's decision return increases, and the weapon consumption of the current stage also increase. So the number of surviving assets increases when the operational mission is completed. It is worth noting that when λ is 0.7, 0.8, and 0.9, the number of surplus weapons keeps decreasing, 15, 7 and 4 respectively. However, the number of decision stages and surviving targets remains unchanged, 2 and 49, respectively. It can be seen that if λ is higher than 0.7, weapon resource is easily wasted.
In conclusion, the model parameter λ has the effect of balancing the "radical-conservative" degree of the decision plan, and the reasonable value is between 0.6 and 0.8.

Problem Scale {M, N, L}
In case 9 of Table 4, the surplus number of weapons is only 4. Hence the relationship between the scale of weapon-target-asset directly affects the completion of the combat mission. In order to provide reference and basis for the configuration of completing the A-DWTA/RH mission, the dynamic relationship among the scale of weapon-target-asset is experimentally analyzed. In this experiment, the target number is fixed to 50. To ensure the possibility of completing the mission, the weapons should be more than targets. The weapon number is set to m = [60 : 10 : 100] in turn, and the asset number is set to l = [20 : 10 : 80]. The model parameter is set to 0.8. The remaining variables are consistent with the previous experiment. As the scales and parameters of each case in this simulation are different, they can only be analyzed based on the win-loss indicator's statistical data. The distribution of defense mission completion indicator is shown in Figure 11. Illustrated by Figure 11, all assets are destroyed (Γ = 0) only when the asset number is 5, and the weapon number is 55, 60, and 70. The reason is that the asset number is too small, and the weapon number is only 10% to 20% higher than the incoming target number, which cannot guarantee the survival probability of assets in a few stages. When the weapon number is less than 70, and the assets number rises to the target number (50), the situation of Γ = −1 is basically obtained, that is, the weapons are fully used. The reason is that the weapon number is close to the target number, making it impossible to complete the defense mission. With the increase of asset number (l > 50), the target cannot destroy all assets in a few stages under the attack of weapons, so the weapons are consumed first. Finally, when the number of weapons is no less than 75, the probability of successful defense (Γ = 1) increases significantly. When the number of weapons is no less than 95, the proportion of Γ = 1 is 100%. In summary, regardless of the state of assets, the weapon number of 95% higher than the target number can ensure the completion of the defense mission.

Set Capacity U m and Differentiation Number D
In HA-SMR, the solution set's capacity u m and the differentiation number d of a single solution are the essential parameters to control the search space. The following cases are set to verify the influence of u m and d on HA-SMR: In the initial stage, weapon number m is set to 80; the target number n is set to 50; the asset number l is set to 50; the weapon-target kill probability P and the target-asset damage matrix Q are generated by a formula, and the weapon consumption threshold is set to 0.8. Let the capacity of solution set be u m = [10:10:100], and the differentiation number be d = [1:1:10]. Figure 12 gives the fitting surface of four metrics distribution under different solution set capacity u m and differentiation number d. Figure 12a shows that the capacity u m and differentiation number d have no impact on the completion of the defense mission. The deterministic convergence mechanism of HA-SMR can give an effective solution under different values of u m and d. In Figure 12b, the number of decision stages s m decreases with the increase of the solution set's capacity u m , and increases with the increase of the differentiation number d. According to Figure 12c, increasing the values of u m and d can improves NAV metric. In Figure 12d, when the value of u m rises from 10 to 100 and the value of d rises from 1 to 10, the compute time increases from 0.0064 s to 20.1675 s. In conclusion, on the premise of satisfying the real-time, the capacity u m of the solution set should be increased, and the number differentiation number d can be set to 5. Continue to increase the value of d do not significantly improve NAV, and burden the number of decision stages and compute time.

Threshold Parameter ρ
To verify the threshold ρ of weapon consumption in HA-SMR, the following cases are set: in the initial stage, the weapon number m is set to 100; the target number is set to 20; the asset number is set to 20; the weapon-target kill probability P and the target-asset destroy matrix Q are generated by a formula, and the solution set's capacity u m and differentiation number d are set to 100 and 5. Let the threshold be ρ = [0.1:0.1:0.1], and adopt HA-SMR to solve A-DWTA/RH model. Record the number of weapons consumed in the first stage under different ρ values, and the plot of NAV and NAV return with the increase of weapon consumption, as shown in Figure 13. To further analyze HA-SMR dynamic performance under different ρ values, Figure 14 shows the mission completion state, the number of decision stages, the normalized asset value, the number of surplus weapons, and the number of surviving targets when the threshold ρ is 0.1 to 0.9. In Figure 14, the defense mission cannot be completed (Γ = 0) when the value of ρ is 0.1 and 0.2. The assets are all destroyed by targets. The reason is that the ρ value is too small to limit the weapon consumption at each stage, making the number of stages (u m = 5, 19) and the number of surplus weapons large. When the value of ρ rises from 0.3 to 0.9, the state of defense mission is Γ = 1; the number of decision stages drops from 13 to 1; the number of surviving targets is 0, and the number of surplus weapons decreases from 73 to 2. The above metrics are relatively stable at ρ of 0.3 to 0.7. Like the above experimental conclusion, ρ of 0.7∼0.9 has better cost-effectiveness, which can minimize the number of decision stages, maximize the asset value and avoid waste of weapon resources on the premise of ensuring the completion of defense tasks.

Conclusions and Future Work
The major A-DWTA researches adopt the global return model based on multi-stage prediction, which can not fully reflect the operational requirement at the current stage and burden computing resource. From the above motivation, the A-DWTA problem is first analyzed by OODA theory. Then the A-DWTA/RH decision model is established, and the solving algorithm HA-SMR is designed. Experimental results show that the obtained decision plan can complete the defense mission in fewer decision stages than several comparison algorithms. The "radical-conservative" degree of the obtained plan can be adjusted by the model parameter λ and the algorithm parameter ρ. The model weight λ is suggested as 0.6∼0.8, and the damage threshold ρ is suggested as 0.7∼0.9.
The proposed heuristic information based on statistical marginal return is not only limited to the A-DWTA model, but also has a certain universality for weapon selection of various WTA problems. In this paper, the heuristic information is designed by the qualitative analysis of domain knowledge, and lacks rigorous theoretical verification. Aiming at the WTA of specific scenarios, design the corresponding weapon selection information by quantitative derivation, which is the future challenge.