Autonomous Trajectory Planning and Control of Anti-Radiation Loitering Munitions under Uncertain Conditions

As an autonomous system, an anti-radiation loitering munition (LM) experiences uncertainty in both a priori and sensed information during loitering because it is difficult to accurately know target radar information in advance, and the sensing performance of the seeker is affected by disturbance and errors. If, as it does in the state of the art, uncertainties are ignored and the LM travels its planned route, its battle effectiveness will be severely restricted. To tackle this problem, this paper studies the method of autonomous planning and control of loitering routes using limited a priori information of target radar and real-time sensing results. We establish a motion and sensing model based on the characteristics of anti-radiation LMs and use particle filtering to iteratively infer the target radar information. Based on model predictive control, we select a loitering path to minimize the uncertainty of the target information, so as to achieve trajectory planning control that is conducive to the acquisition of target radar information. Simulation results show that the proposed method can effectively complete the autonomous trajectory planning and control of anti-radiation LMs under uncertain conditions.

runway pattern. However, in an actual battle with fierce confrontation and a complex information environment, it may not be possible to accurately obtain the target radar information in advance. At the same time, the anti-radiation seeker's sensing results still have uncertainties such as missed signal detections, false alarms, and random errors. Therefore, it is necessary to study the trajectory planning and control of an anti-radiation LM under uncertain conditions. Studies on the planning and control of unmanned systems have been increasing for environmental monitoring [6] and search and rescue [7] under uncertain conditions. Schlotfeldt et al. [8] studied the problem of searching for uncertain targets of a Gaussian distribution by ground robots with sensing with Gaussian additive noise; they used invariant extended Kalman filter (IEKF) to iteratively estimate the target position and the variance of the target estimate as the path cost to plan the direction of a robot's motions. The feasibility of information entropy, mutual information, and KL divergence was discussed from the perspective of sensor deployment as an indicator of target information uncertainty, and it was noted to be equivalent to variance measures when the prior and noise follow Gaussian distributions [9]. However, when the distribution does not have a unimodal characteristic, to use variance as a metric will misdirect the uncertainty estimation of the target information.
Therefore, based on the modeling of the anti-radiation LM's motion and sensing characteristics, we use particle filtering to perform Bayesian iterative inferencing of target radar information to solve the problem of estimating the target radar state under the conditions of a priori and sensing uncertainty. Then, we minimize the uncertainty of the target information measured by conditional entropy as the control optimization objective. Using a model predictive control method, we select the heading of the anti-radiation LM for the airborne autopilot to execute and achieve the loitering trajectory planning control that is conducive to the acquisition of target radar information. The general idea of planning control is shown in Figure 2. The effectiveness of the method is verified by simulation experiments.
The rest of this paper is consisted as follows. Firstly, the description and modeling of the problem are formulated in Section 2. Then, the inference of target radar position is described in Section 3. Section 4 shows how the model predictive control method is applied to the control of loitering under uncertainty. Simulation results and conclusions are mentioned in Section 5 and Section 6, respectively.

Mission Scenario
Assume a fixed target radar station in a known battle airspace = × . An antiradiation unmanned aerial vehicle has arrived and entered the autonomous patrol mode.

Anti-Radiation LM Trajectory Planning Model
The motion planning models used in LM trajectory planning usually include free mass point [4], direction-limited mass point [10], and first-order coordinated turn [11]. The more accurate the trajectory planning model, the easier it is for the flight controller to complete the planned trajectories or selected control actions. The anti-radiation LM seeker is mounted on the nose and is limited by the field of view. During loitering, by default, the airborne autopilot uses coordinated turns to control the heading, keeping the nose direction consistent with the flight direction. Some models of anti-radiation LMs are specially designed with side-enforced plates to assist in accurate and coordinated turning [1].
Referencing the coordinated turning motion model, assuming the anti-radiation LM is flying in the battle airspace at a constant speed and height during the loitering phase, the control action space is a finite discrete yaw angle, and the factor of flight height is ignored, we build a two-dimensional trajectory planning model on the flight plane of the LM, where s and s are the east and north coordinates, respectively, of the anti-radiation LM relative to a certain origin of the battle airspace at time in the northeast coordinate system; is a constant speed; is the yaw angle; and u ∈ = 0, − , denotes the control actions of the anti-radiation LM at time , i.e., level flight, and yawing to the left and right.

Radar Target Modeling
The state of the target radar consists of its position coordinates and transmission power, which is represented by a vector, = [ ] , where and are the east and north coordinates, respectively, of the radar target in the northeast coordinate system of a certain origin in the battle airspace, and > 0 is the radar transmission power reference constant, which is determined by the logarithmic transmission power at a fixed distance from the radar transmitter. For the anti-radiation LM, is an unknown vector, in the form of an a priori probability distribution based on pre-battle intelligence information and the commander's judgment. When the target radar position information is unknown, it is considered uniformly distributed in the battle airspace.

Seeker Sensing Modeling
The target radar information sensed by an anti-radiation LM consists mainly of the signal amplitude and angle of arrival measured by the seeker on the signal within the main lobe of the antenna. Due to the complex electromagnetic environment of the battlefield, in addition to the real signal of the target radar are environmental background noise and impact clutter [12], which lead to uncertainties such as missed signal detection, false alarms, and random errors in the seeker's sensing results.
Under noise disturbance, the sensing result obtained by the anti-radiation LM seeker at time can be expressed as [13] = ℎ (ℎ( , ), ( , )), where ∈ ℝ is the sensing result of the seeker at time , is the noise related to the LM's pose and environment, ℎ(⋅) is the theoretical sensing result without noise disturbance, and ℎ (⋅) is a nonsingular vector transformation that characterizes the sensing result.
Assuming that the noise level obeys a zero-mean Gaussian distribution with variance , and the consistant false-alarm rate is , the radar signal radiation intensity reaching the receiver can be modeled as where > 1 is a constant related to the electromagnetic propagation mode, and = − − is the relative distance between the LM and the radar target.
Affected by internal device thermal noise and environmental background noise, the seeker's detection of radar signals depends on setting the detection threshold and filtering out signals below it. Too high a detection threshold will decrease the sensitivity of detection and miss the target, while too low a threshold will fail to filter out the noise and will generate false signals. The consequences of failure due to too many false alarms caused by saturation of the back-end signal processing module are more serious than those of missed detection. Therefore, modern anti-radiation LM seekers usually adopt a constant false-alarm detection system [14] whose detection threshold is set according to noise intensity and a given false-alarm probability. The adaptive detection threshold ℎ satisfies ( ≥ ℎ) = ; hence, ℎ = 1 − , where (⋅) is the Gaussian distribution quantile. Then, the signal detection probability determined by a given false-alarm rate, noise, and target radar radiation signal strength is The arrival angle of a signal is obtained by the direction measurement of the antenna array mounted on the front of the LM, whose zero direction is consistent with the direction of the LM fuselage. The signal arrival angle measurement can be modeled as where atan2 is the four-quadrant inverse tangent function, and is the angle measurement noise introduced by random distortion of the target radar waveform, whose distribution usually has a long tail [15]. We use the symmetric alpha-stable distribution [16], where , are the parameters of the symmetric alpha-stable distribution. So far, the target radar signal sensed by the anti-radiation LM seeker can be modeled as a set of unordered vectors, where is the effective angle measurement range determined by the antenna directional characteristics of the seeker, and is the number of detected signals in the sensing result, which may contain real signals coming from the target radar, or all false signals caused by noise. Assuming that the false-alarm signals are uniformly distributed over the direction measurement range of the seeker, when the seeker's angular resolution is high ( ≫ 1) and the false-alarm rate is low ( ≪ 1), the appearance of false-alarm signals in sensing can be considered a Poisson point process, , , where = ⋅ is the rate parameter.

Inferencing Target Radar Position
The anti-radiation LM continuously uses the seeker's sensing results to update the information of the target radar and perform Bayesian inference in the air. Assuming that the a priori distribution of the target radar state at time is ( ) = ( | , ), and knowing that the radar target is stationary, the result of the seeker's sensing at time + 1 is . Using Bayes' theorem, the probability distribution of the radar target state at time + 1 is updated as where the finite term summation in the integral part can be performed in the finite state hidden Markov model. Under the linear system and prior Gaussian distribution, the closed-form solution of the parameters of the conjugate distribution can be obtained recursively. However, for a non-Gaussian nonlinear observation, such as Equation (6), Equation (7) has no closed-form recursive solution.
To solve this problem, we use a sampling-importance resampling (SIR) particle filter and implement Bayesian recursion through Monte Carlo simulation. The SIR particle filter obtains a set of weighted random samples (particles) by performing sampling using the prior distribution as the proposed distribution, and it uses the sampled particle likelihood function to update the weights to approximate the posterior probability density. The SIR particle filter uses resampling to reduce particle degradation and improve computational efficiency [17]. Let 〈 , , , 〉 be the weights of a group of particles at time , where , ∈ ℝ and ∑ , = 1. The probability distribution of the radar target states at time can be approximated by particle weights and positions as where −x , is the unit impulse function at x , [17].
SIR particle filtering has prediction and update steps. Since the target radar is stationary, the positions of new states of the randomly sampled particles in a prediction step predicted by the target motion model do not change. In addition, resampling decreases particle diversity (particle depletion) and reduces estimation accuracy. Random disturbances with bandwidths inversely proportional to particle weights are introduced to maintain particle diversity, , = , + , , where obeys a standard normal distribution, and ≪ 1 is a selected constant [18].
An update step uses the likelihood function of the sensing result to update the particle weight as In the presence of false-alarm signals, the likelihood function of the sensing result ( | , ) is obtained from the joint correlation probability of the radar target and the sensing result. Let be the -th element in the unordered vector set , which can be obtained according to the LM seeker's sensing characteristic; see Equation (6). When ‖ ‖ > 0, and when ‖ ‖ = 0, i.e., = ∅, At this time, the estimate of the target radar state is

Loitering Control by Minimizing Target Uncertainty
According to the sensing characteristics of the seeker, the sensing result has uncertain factors due to environmental noise disturbance and measurement errors, and certain factors that are determined by the relative position and orientation between the anti-radiation LM and the target radar. Autonomous anti-radiation LMs can plan and control the selection of loitering paths, and they can fly to a position in an orientation that is conducive to the acquisition of target radar information. This improves the probability of obtaining high-quality sensing and optimizes loitering trajectories.

Measure of Target Information Uncertainty
The motion and sensing of an anti-radiation LM are highly nonlinear. The sensor noise is not Gaussian. The prior information of the target radar must be extracted from prior intelligence, which cannot be guaranteed to be Gaussian. Therefore, we use information entropy to measure the uncertainty of target radar information.
Information entropy is a widely used measure of the uncertainty of random variables. The higher the information entropy, the greater the uncertainty. Considering target radar states and sensing results as random variables , , the information entropy of is In radar position estimation, omitting the symbol , the conditional probability of using the sensing data to update the estimation of the target radar state is ( | ). We substitute this in Equation (13) to obtain the conditional entropy of target radar information about sensing variable , Conditional entropy reflects the uncertainty of the target radar information after Bayesian inference to obtain the sensing variable , and it determines the lower limit of the accuracy of estimating X with Y [19]. The lower the conditional entropy, the higher the achievable estimation accuracy, which satisfies Let be the time when loitering ends, and let the subscript 1: represent the set of all subscripts = 1,2, ⋯ , . We seek to minimize the conditional entropy of the target radar information on sensing and establish an optimal control model for the choice of loitering path of the anti-radiation LM,

Conditional Entropy Calculation Based on Particle Position Weight
Since particle filtering uses a set of discrete weighted particles sampled from the proposed distribution to approximate the target posterior distribution, we cannot directly calculate the conditional entropy [20]. Many conditional entropy approximation algorithms been proposed based on particle filtering [6,[21][22][23]. A line segment fitting method was used to approximate conditional entropies [21], which is equivalent to the number of nearest neighbors being one [22]. This method has no parameters to be adjusted, but its accuracy decreases as dimensionality increases. A Gaussian kernel function was used to smooth the posterior distribution represented by weighted particles [6]. This requires the selection of a reasonable kernel bandwidth. First-order historical information of particles was used to approximate the conditional entropy in gradient calculation [22,23], requiring the storage of only the states of particles at the previous moment, with complexity ( ). In this paper, the target radar states are static with x , x , = 1 . Substituting x , x , with constant 1 in Equation (52) in [23] yields an approximation of the conditional entropy,

Model Predictive Optimal Control
As an optimal control problem, the full time-domain optimal closed-loop feedback control law in Equation (16) is usually difficult to obtain. Model predictive control adopts the idea of rolling-horizon optimization, and it only considers optimization in the finite time domain . It solves the finite time-domain optimization problem in an open loop, taking the control action sequence of the solution, applying it to the current moment, and seeking a solution according to the state information of the next moment. Model predictive optimal control has the following advantages. As a method to solve open-loop optimization problems, it ensures the feasibility of solving problems with complex constraints. Using the system's state feedback to periodically seek a solution, it improves execution reliability in the presence of environmental disturbances. State prediction takes into account future benefits and can improve system performance.
Expanding Equation (16) Since the maneuvering has limited options, based on the motion element graph search method [24], letting be the root node, we select u from to expand the candidate node ̂ , calculate the conditional entropy of the group of weighted particles at time , construct a motion element graph with LM states as nodes, control actions as edges, and consider the conditional entropy as the node score. Traversal search is used to find the optimal solution, and the optimal control action sequence is obtained in Equation (18).

Experimental Verification by Simulation
To verify the effectiveness of the proposed method, we compared it to two others in a simulated scenario of an anti-radiation LM arriving in a designated airspace for patrol and searching for a target radar with unknown location and radiation power [4,8].

Experimental Conditions
The simulated battle airspace was the plane area = [−1000,1000] × [−1000,1000] (in units of 100 m). The target radar coordinates were (300, 300). The radar radiationrelated coefficient was = 30.88. The initial state model parameters of the anti-radiation LM were loitering speed = 30 (180 km/h), initial heading angle ~(0,2 ), yaw angle = 15°, and initial position coordinates (−500,−480). The anti-radiation LM airborne seeker sensing model parameters were =2, =5%, σ=3, =60°, =0.75, =0.8, and =120. For a seeker at 350 distance units (3.5 km), the probability that the target radar signal exceeds the detection threshold is 95%. We assumed no advance intelligence information. The target radar position was considered to be uniformly distributed in the battle airspace. Other parameters were as follows: model prediction step = 3 and number of particle filter particles =200. The simulation was coded with Octave [25] and ran on a computer with a 3.8-GHz i9 processor with 32 GB memory.

Stochastic Decision-Making
In a purely stochastic strategy, each LM randomly selects a command that will not fly out of the battle airspace to execute in the optional action space at each decision moment.

Re-Planning Based on Field-of-View Coverage
Under the uncertainty conditions in this study, there is no known actual position information of the radar target, and the method in [4] cannot be used directly. Therefore, the path planning method of [4] was used only after a particle filter with the same parameters as ours was used to estimate the target position. When the sensing result of the antiradiation LM seeker is updated, the target position estimate is updated, the route is replanned, and the flight path of the anti-radiation LM is controlled according to the new route. Parameters used are = 200, = 0.9, = 0.05, and = 50.
Refer to [4] for details.

Method Assuming Gaussian Noise
The IEKF method [8] was used to replace the particle filter algorithm used in this study to test the effectiveness of the particle filter in nonlinear and non-Gaussian cases. In this method, the nonlinear motion and sensing models are approximated by a first-order linear expansion, and the non-Gaussian noise component in the angle measurement is approximated by Gaussian noise ~(0,0.4). Since the prior distribution of the target radar information in the IEKF method must also be Gaussian, uniform distributions cannot be used. Each simulation randomly selects a Gaussian distribution with a center uniformly distributed within [0,600] × [0,600] × [20,40] and variance (33 , 33 , 20 ) as the prior distribution of target radar information.

Simulation Results and Analysis
The first simulation experiment is conducted in a fixed total number of simulation steps ( =150 ) to see the difference of performance among the methods to be compared. Table 1 shows the statistical results of target radar conditional entropy ( | : ) and position RMSE at the end of 100 repeated simulations of the four methods. A target radar position RMSE is the Euclidean distance between the filtering-estimated position RMSE reflects the absolute accuracy of the system's estimation of the target radar position. The smaller the RMSE, the higher the estimation accuracy. The data in Table 1 show that at the end of the simulation, the target radar conditional entropy and position RMSE obtained by the proposed method were the lowest, which were about 30% of those obtained by the "field-of-view coverage" method with the re-planning mechanism.  Figure 3 shows the variations of target radar mean conditional entropy and position RMSE with the number of steps during 100 simulations. The rates of decrease in conditional entropy and RMSE of the proposed method are greater than those of the other methods, indicating that the planning control method with conditional entropy as the optimization objective can better deal with uncertainty during loitering, achieves better final estimation accuracy, and reduces the effect of uncertainty.
Compared with stochastic decision making, the method of constantly re-planning based on the sensing of field-of-view coverage is beneficial to the estimation of the target radar position, but the performance is not as good as that of the IEKF method based on the Gaussian assumption and the proposed method. This is because of the following: (1) the method of constantly re-planning based on the sensing of field-of-view coverage makes advance planning in the prediction step without considering the motion characteristics of anti-radiation LMs; and (2) in the stage of high uncertainty, this method is still directly based on the estimated target radar position to plan the route, and the use of lowprecision estimation results degrades performance.
Based on the Gaussian assumption, the IEKF method has a slightly lower initial conditional entropy due to the different selection of prior distributions. As the simulation progresses, the rates of decrease in RMSE and conditional entropy are not as great as those of the proposed method because noise with an alpha-stable state distribution has a higher tail probability than noise with the Gaussian distribution, and there are more outliers. Furthermore, with the loss of accuracy due to the linear approximation of IEKF, its estimation accuracy under nonlinear non-Gaussian noise scenarios is less than that of the particle filter method used in this paper. This lack of accuracy affects the choice of control actions and further affects the acquisition of target radar information. Therefore, the particle filter used in this paper can improve performance in nonlinear dynamic and non-Gaussian noise sensing scenarios.
As shown in Figure 3, the RMSE and conditional entropies obtained by different methods change with the simulation steps in three stages. At the beginning, while the conditional entropy decreases, the RMSE does not decrease. At this stage, the anti-radiation LM quickly reduces the uncertainty of target radar information through sensing, but because the estimated distribution of the target radar still is not unimodal, the estimated deviations are always relatively large. Subsequently, the decrease in conditional entropy is basically consistent with that of RMSE, indicating that the anti-radiation LM continues to reduce uncertainty through sensing while simultaneously improving estimation accuracy. Finally, the decline in conditional entropy and RMSE slows down, marginal information gains during loitering begin to decrease, and improvement of estimation accuracy slows down. At this time, according to the mission objective, the seeker can be switched from the wide-area search mode to the tracking mode, and the anti-radiation LM ends the loitering stage and executes a diving attack.
where total control inputs reflects the total control efforts consumed during the patrolling process and related to energy consumption of the flight trajectory. The smaller the total control inputs are, the less energy is consumed during the patrolling process. It can be inferred from Figure 4 that the total control inputs of the four methods compared here, including the proposed method, are similar in average. However, the total control inputs of the "Coverage of Field", "Iterative EKF", and "Proposed Method" are slightly higher than the "Random Strategy" in that "Random Strategy" ignores the perception results and choses the control action randomly. That the total control inputs of the proposed method are similar to the other methods shows that the performance improvement shown in Figure 3 depends on the cost of control efforts or energy consumption.  Figure 5 shows the trajectories and final particle distributions of the anti-radiation LM in a randomly selected simulation. Although, due to the uncertainty of a priori and sensing information, the planned path and inferred particle distribution of the anti-radiation LM are different in each simulation, it can be seen that the trajectory generated by our method circles more around the target radar, and it is closer to the planned trajectory when the conditions are certain; at the end of the inference, the distribution of particles is also more concentrated near the true position.  c) and (d) shows one of the trajectory generated in the 100 simulations by the random strategy method, coverage of field method, iterative EKF method and the proposed method respectively. The second simulation experiment is conducted in a variant number of simulation steps, and the simulation stops once the desired RMSE is reached. This experiment examines the time required for different methods to achieve the desired RMSE for anti-radiation LM tracking and attacking under a combat scenario. Table 2 shows the statistical results of the time step to reach the desired RMSE during the simulation. The lower the desired RMSE needed, the faster it is achieved. The specific time to reach the desired RMSE for anti-radiation LM differs from the different methods. Among these methods, the proposed method performs best, especially in low RMSE scenarios. "Random Strategy" and "Coverage of Field" cannot reach the desired RMSE of 100 m in 2000 time steps, which is more than 10 times what the proposed method does.

Conclusions
We studied the autonomous trajectory planning and control of anti-radiation LMs under conditions of uncertain a priori information of the target radar and the seeker's sensing. By modeling the problem as an optimal control problem that minimizes the uncertainty of target information, combined with particle filtering and model predictive control methods, better loitering tracking and higher quality target information than those of previous methods were obtained. The results can provide a reference for research and development and for the operational use of anti-radiation LMs.
It should be noted that we did not discuss trajectory planning control when the marginal information gains decrease in the later phase, when the LM shifts from the cursing stage to tracking and attacking, nor trajectory planning control when target decoys exist. Future research can be carried out in the following areas: (1) reducing computational cost and improving real-time performance by improving the proposed distribution, increasing the efficiency of particle sampling, and introducing box-type and adaptive particle filtering; (2) based on information entropy, studying the strategy of autonomous switching from loitering to tracking and attacking when information return declines; (3) studying the anti-decoy loitering method with decoy information sensing incorporated in the presence of decoy radar; (4) testing the proposed method in a high fetidity simulation environment or real-world experiment.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.