Hybrid Dual-Scale Neural Network Model for Tracking Complex Maneuvering UAVs

: Accurate tracking and predicting unmanned aerial vehicle (UAV) trajectories are essential to ensure mission success, equipment safety, and data accuracy. Maneuverable UAVs exhibit complex and dynamic motion, and conventional tracking algorithms that rely on predefined models perform poorly when unknown parameters are used. To address this issue, this paper introduces a hybrid dual-scale neural network model based on the generalized regression multi-model and cubature information filter (GRMM-CIF) framework. We have established the GRMM-CIF filtering structure to differentiate motion modes and reduce measurement noise. Furthermore, considering trajectory datasets and rates of motion change, a neural network at different scales will be designed. We propose the dual-scale bidirectional long short-term memory (DS-Bi-LSTM) algorithm to address prediction delays in a multi-model context. Additionally, we employ scale sliding windows and threshold-based decision-making to achieve dual-scale trajectory reconstruction, ultimately enhancing tracking accuracy. Simulation results confirm the effectiveness of our approach in handling the uncertainty of UAV motion and achieving precise estimations


Introduction
Unmanned aerial vehicles (UAVs) are integral in various fields, including aerial photography, logistics, search, and rescue missions [1].Tracking moving targets is essential to ensure effective UAV mission execution, maintain steady tracking of the target, and provide the necessary data and information for decision-making [2].However, the maneuvering performance and behavior of UAVs may be affected by a variety of factors, such as wind, manipulator's intent, and environmental influences, resulting in varied and unpredictable motion patterns.Therefore, the tracking algorithms are difficult to establish accurate models in advance [3,4].
Since the last decade, scientists have been exploring various approaches to tracking targets [5,6].UAV states are described using dynamic equations, with parameters incorporated into the state vector dimensions to facilitate joint estimation [7].Past work has made significant advances in effectively solving the problem of target tracking, however, the robustness and convergence of these algorithms directly depend on accurate initial state estimation of process and measurement noise, unknown parameters, and covariance matrices [8][9][10].As technology advances, modern target tracking environments become increasingly complex, which further increases the difficulty of tracking tasks [11].To overcome this challenge, several improved extension and modification models from traditional interactive multimodal model (IMM) algorithms were proposed [12].For instance, Wonkeun et al. suggested an adaptive Kalman filter IMM (AKF-IMM) to estimate the unknown time-varying measurement loss probability adaptively [13].Adaptive target Drones 2024, 8, 3 2 of 22 tracking of IMM with heterogeneous velocity representation and linear/curved motion model was proposed in [14].Lu et al. derived an adaptive IMM filter for jump Markov systems with inaccurate noise covariances and missing measurements based on Kullback-Leibler average (KLA) [15].While these extensions improve state estimation accuracy by dynamically adjusting the model and covariance matrix, they are limited by particular maneuvering target models [16][17][18].
Data-driven approaches do not require prior knowledge or models, they can automatically extract features from data and make predictions by learning patterns and relationships between input data [19][20][21].For instance, algorithms such as long short-term memory (LSTM) [22] in deep learning have shown promising results in target tracking applications.These algorithms possess the ability to capture and learn long-term dependencies, enabling them to capture better the dynamic characteristics of target motion [23].
However, in contrast to model-driven methods, data-driven approaches necessitate extensive datasets for efficient model training, and the efficacy of models is significantly contingent on data quality and quantity [24,25].In addition, data-driven methods also have problems such as overfitting or instability.For the traditional MM algorithm [26], an appropriate model based on an estimate of previous observations always needs to catch up to the current target state [27], causing performance deterioration, especially for highly maneuverable targets or unpredictable target movements.Therefore, future research can explore combining model-driven and data-driven approaches to achieve better target tracking [28].This may involve combining data-driven techniques with traditional physical models to understand the target's motion and maneuvering better, improving tracking accuracy and robustness [29].
In this paper, we propose a multi-model tracking method based on dual-scale deep learning within the framework of generalized regression multi-model (GRMM) and cubature information filter (CIF) [30], which is applicable to multiple targets with complex maneuvering motions.Unlike the aforementioned approaches, this method combines model-driven and data-driven schemes to improve tracking accuracy and robustness.The primary contributions of this approach are as follows: • To improve the multi-model algorithm of the Markov transfer chain, GRMM provides an effective Markov transfer matrix according to the database when the prior parameters of maneuvering UAVs are unknown, which improves the discrimination of the motion state of the maneuvering target and uses CIF nonlinear filtering to filter the measured value to improve the tracking accuracy of the maneuvering UAVs.• We design a dual-scale Bi-LSTM network to correct state delay and improve the state estimation of the filter for maneuvering UAVs.This structure considers the temporal relationships of the maneuvering target's state vector at different scales, which enhances the filter's adaptability to complex maneuvering motions and reduces tracking errors caused by delays.
This manuscript is organized as follows: Section 2 provides a brief analysis of the motion characteristics of the maneuvering target and presents the mathematical modeling of the maneuvering target.Section 3 introduces the dual-scale neural network prediction algorithm under the multi-model filtering framework to deal with the delay problem of the maneuvering target state transition.Section 4 validates the effectiveness of the proposed methodology.Multi-model adaptive tracking prediction method through simulation experiments and our conclusions are presented in Section 5.

Nonlinear Motion Mode of Maneuvering Targets
This paper considers a three-dimensional (3D) plane coordinate of UAV tracking.In real-world scenarios, state estimation for UAV tracking can be considered as a discrete-time nonlinear dynamic system [26].
where x k+1 ∈ R n denote the state vector of the system at time k, f is vector-valued (possibly time-varying) functions, and n is a positive integer.u k represents the control input at time step k, as for the weak maneuvering trajectory, we can ignore it in the state transition equation [29].η k is the zero-mean Gaussian noise.The process noise η k can impact various components of the state vector x k such as velocity, distance, and other relevant parameters.
The discrete-time equivalent of the above continuous-time model is The transition matrix F is defined as three shapes, F CV constant velocity (CV) state, F CA constant accelerated (CA) state, and F CT constant-turn (CT) state, to satisfy the requirement in generating maneuvering target trajectories.According to [26], the CV mode is defined as where ∆τ represents the time step.In matrix theory, diag commonly refers to a diagonal matrix.F cv k , F ca k , F ct k represent 3 × 3 subtransition matrices corresponding to specific motion models (CV, CA, or CT) which maneuvering target can follow during 3D tracking, the resulting block diagonal matrix F CV (k), F CA (k), F CT (k) will be 9 × 9 matrices.
In the context of UAVs in cruising mode, we define the CA mode as.
In the context of UAVs in cruising mode, we define the CT mode as.
where ω m denotes the turn rate for the constant-turn mode.The measurement vector of system at time k, y k ∈ R m , can be expressed as where  m denotes the turn rate for the constant-turn mode.
The measurement vector of system at time k ,  y m k , can be expressed as ( ) where is the radar station location and where T is the radar station location and T what is obtained is the distance r k , Doppler velocity v k , azimuth angle φ k , and pitch angle θ k measured by the radar to the target.H k is the nonlinear 3D range measurement function.
T is the measurement noise of distance n r , Doppler velocity n v , azimuth n φ , and pitch angle n θ .

IMM-CIF Method of Maneuvering Targets
IMM is an algorithm used for target tracking and estimation [7].This algorithm employs multiple different models simultaneously during the tracking process and each model describes the motion behavior of the target.Figure 1 depicts the block diagram of the proposed tracking IMM algorithm.This algorithm employs three CIFs.The first filter incorporates the constant velocity mode to handle the straight-line motion of the target.The second filter addresses the turning motion of the target, while the last filter considers the target's acceleration motion.

H x ( )
is the radar station location and

IMM-CIF Method of Maneuvering Targets
IMM is an algorithm used for target tracking and estimation [7].This algorithm employs multiple different models simultaneously during the tracking process and each model describes the motion behavior of the target.Figure 1 depicts the block diagram of the proposed tracking IMM algorithm.This algorithm employs three CIFs.The first filter incorporates the constant velocity mode to handle the straight-line motion of the target.The second filter addresses the turning motion of the target, while the last filter considers the target's acceleration motion.The workflow of the proposed IMM-CIF method is described as follows.

CIF
 Interaction of state estimation assuming The workflow of the proposed IMM-CIF method is described as follows.
• Interaction of state estimation assuming Assuming xi k−1 represents the state estimate of filter i at time k − 1, γ i|j k−1 represents the model probability update vector at time k − 1, where i, j = 1, . . ., r and r denotes the index of the CIF.The outcome of the interaction involving the state estimates, xoj k−1 , can be expressed as The corresponding state covariance matrix Poj k−1 can be represented as [12] In this step, the mixing probabilities γ i|j k−1 are calculated by mixing the previous state estimates and their covariance matrices.
Drones 2024, 8, 3 where π ij represents the transition probability from model i to model j, and can be expressed as Moreover, the normalization constant is •  [17] are used to calculate the likelihood of each filter, which is given by Then, the mode probability update for the jth filter is computed as where •

Model output
Finally, all the filter outputs, including their state estimates xk and error covariance matrices Pk , are weighted and fused using the updated mode probabilities.This process ultimately produces the output state estimate and its error covariance matrix.

Proposed Tracking Method
In this section, we introduce a UAV trajectory prediction approach that relies on GRMM-CIF and DS-Bi-LSTM.We will provide details and steps related to the implementation of this technique.The sequence prediction capability of LSTM and the multi-model switching capability of IMM are used to adapt to the change of the target in different motion modes.The relevant flowchart is shown in the Figure 2 below.The data input module imports measurements acquired from radar detection of the target while initializing the state and parameters.The multi-model discrimination module employs an interactive GRMM multi-model structure to compute the model probabilities associated with the measurements.The CIF filter processing involves filtering and tracking the measurements of targets by using the GRMM-CIF framework, facilitating the estimation of the motion state.The target's state is updated based on the outputs generated by the GRMM-CIF filter.
Subsequently, the filtering state correction is conducted, and the DS-Bi-LSTM network is designed to predict the target's state using dual scales, effectively rectifying the delay issues encountered during the tracking of multiple motion models.This entire process is iteratively executed, ensuring the continuous update of the target's state and predictions.
input module imports measurements acquired from radar detection of the target while initializing the state and parameters.The multi-model discrimination module employs an interactive GRMM multi-model structure to compute the model probabilities associated with the measurements.The CIF filter processing involves filtering and tracking the measurements of targets by using the GRMM-CIF framework, facilitating the estimation of the motion state.The target's state is updated based on the outputs generated by the GRMM-CIF filter.Subsequently, the filtering state correction is conducted, and the DS-Bi-LSTM network is designed to predict the target's state using dual scales, effectively rectifying the delay issues encountered during the tracking of multiple motion models.This entire process is iteratively executed, ensuring the continuous update of the target's state and predictions.The proposed GRMM algorithm utilizes a neural network to calculate the Markov transition probabilities of multiple models.In this study, the method for updating the Markov chain probabilities involves using a generalized regression neural network (GRNN) [31].GRNN is a type of neural network-based non-parametric model for estimating conditional probabilities between observed data and target models, thereby providing more accurate Markov chain probabilities.By iteratively observing and

Based on GRMM-CIF Maneuvering Target Multi-Model Tracking
The proposed GRMM algorithm utilizes a neural network to calculate the Markov transition probabilities of multiple models.In this study, the method for updating the Markov chain probabilities involves using a generalized regression neural network (GRNN) [31].GRNN is a type of neural network-based non-parametric model for estimating conditional probabilities between observed data and target models, thereby providing more accurate Markov chain probabilities.By iteratively observing and updating the probabilities, and then constructing an interactive multiple model, it becomes possible to dynamically estimate the target's motion model and adaptively adjust it based on the observed data during the tracking process.
To implement this model, first, a Markov chain needs to be constructed.The IMM algorithm uses several different models to describe the target's motion behavior, and each model corresponds to a state of the Markov chain.Initially, an initial Markov chain is defined, where each state corresponds to a model.Let the target state be x, and there are l modes in the IMM (assumed to be M 1 , M 2 ,. .., M l ).Suppose the target state predicted by the LSTM model is xk−1 .Each mode has its own state transition probability matrix A and observation probability matrix C. First, GRMM obtains the predicted weights by calculating the state transition probability and observation probability of each model.The weight of each model at the current moment is represented as w k , and it can be calculated by the following formula: where P( x|x, M l ) is the probability of the predicted value of the neural network model under the given target state and model, and P(x|M l ) is the probability of the target state under the model.Next, the weights of each model are multiplied by their corresponding state transition probabilities to obtain a weighted state transition probability matrix A(w k ) Then, the weighted state transition probability matrices are combined according to certain rules, such as simple summation or weighted average, to obtain the final state transition probability matrix Drones 2024, 8, 3 Finally, the state transition probability matrix A ′ is multiplied by the observation probability matrix C to obtain the final target state prediction:

Dual-Scale Bi-LSTM Tracjectory Prediction Method
Dual-scale maneuvering target trajectory prediction aims to predict the target's trajectory by combining historical data and real-time noise-containing measurement data.The historical data are used to model the target motion, while the real-time noise-containing measurements provide information to correct the prediction results.

Bidirectional Gated Recurrent Unit
Fa Ger et al. [32] proposed a LSTM network, which enables the network to handle long-term correlations between data by introducing cell states and a series of "memory forgetting" mechanisms.LSTM establishes a long-term information retention channel through the gate structure, which can effectively retain and extract long-term information.The structure diagram of the LSTM neural network is shown in Figure 3. LSTM is a variant of recurrent neural networks (RNNs) used for processing sequential data.Below are the formulas that describe the computation process of an LSTM cell, where  and b o are the corresponding weight matrix and bias vector, respectively.The activation functions employed in the neural networks at different scales are distinct; however, within the same LSTM layer, the activation function remains consistent.
Bi-LSTM is a variant of the LSTM neural network architecture, comprising two LSTM subnetworks: one that processes data in a forward manner and another that processes it in a backward manner.The forward LSTM processes the input sequence in the regular order, while the backward LSTM processes it in reverse.The Bi-LSTM structure, as shown in Figure 4, is designed to capture the temporal correlation information of the feature matrix of maneuverable targets at each time step.The proposed dual-scale network LSTM is a variant of recurrent neural networks (RNNs) used for processing sequential data.Below are the formulas that describe the computation process of an LSTM cell, where i(k), f(k), a(k), and o(k) represent input gate, forgetting gate, feature extraction, and output gate, respectively.xk is represented as the input at moment k and h k−1 is the hidden state value of the k − and b o are the corresponding weight matrix and bias vector, respectively.The activation functions employed in the neural networks at different scales are distinct; however, within the same LSTM layer, the activation function remains consistent.
Bi-LSTM is a variant of the LSTM neural network architecture, comprising two LSTM subnetworks: one that processes data in a forward manner and another that processes it in a backward manner.The forward LSTM processes the input sequence in the regular order, while the backward LSTM processes it in reverse.The Bi-LSTM structure, as shown in Figure 4, is designed to capture the temporal correlation information of the feature matrix of maneuverable targets at each time step.The proposed dual-scale network integrates different Bi-LSTM layers depending on the training scale used.The output vector h k corresponding to the kth time point of the ith Bi-LSTM is the element-wise sum of the forward → h k and backward ← h k LSTM outputs, and is calculated as follows: problems, simpler activation functions such as LeakyReLU can be chosen, which is faster to compute and may be a suitable choice for short-scale networks, that is: where  represents the leakage rate in Leaky ReLU, and it is typically chosen to be 0.01.The optimizer chosen in this paper is the adaptive moment estimation (Adam) [33], which combines ideas from other optimization algorithms, Adagrad and RMSprop, to achieve good performance across a wide range of optimization problems.Dual-scale neural networks are optimized using the Adam optimizer to update the weight parameters of the model, minimize the prediction error, and improve the model's predictive performance.

Neural Network Structure for Maneuvering Target Trajectory Prediction
Maneuvering target tracking based on deep learning usually requires a large amount of data for training and optimization, but due to the irregular motion trajectory of the maneuvering target, the data in the track library does not completely cover the motion state of all the maneuvering targets.Therefore, the trajectory prediction state can easily deviate from the real trajectory when dealing with the long-range maneuvering target prediction problem.
Long Scale (long-distance prediction): long-term training, more prediction points, the advantage is that the overall prediction of the track is more accurate.
Short Scale (short-distance prediction): the number of prediction points is small, the speed of prediction is fast, and the advantage is that it can assist in determining the trajectory motion state in real time.
The accuracy of trajectory prediction is improved by quickly identifying changes in the maneuvering target state.To achieve this goal, a dual-scale track prediction network is designed, as shown in Figure 5.The input data consist of tracking measurements.For the long-scale network, a three-layer Bi-LSTM network structure is designed and trained for long-distance prediction.To prevent overfitting, a tanh activation layer and a dropout layer are added after each layer of the Bi-LSTM network.For the short-scale network, the number of Bi-LSTM layers is reduced, enabling faster response in short-distance prediction.The predicted values from both the long scale and short scale are used to reconstruct output tracks, resulting in new predicted tracks with higher accuracy.The nonlinear activation function sigmoid can be expressed as where tanh stands for the tangent hyperbolic function and s is denoted as the argument of the function.Since short-scale networks may not face severe gradient disappearance problems, simpler activation functions such as LeakyReLU can be chosen, which is faster to compute and may be a suitable choice for short-scale networks, that is: where α represents the leakage rate in Leaky ReLU, and it is typically chosen to be 0.01.The optimizer chosen in this paper is the adaptive moment estimation (Adam) [33], which combines ideas from other optimization algorithms, Adagrad and RMSprop, to achieve good performance across a wide range of optimization problems.Dual-scale neural networks are optimized using the Adam optimizer to update the weight parameters of the model, minimize the prediction error, and improve the model's predictive performance.

Neural Network Structure for Maneuvering Target Trajectory Prediction
Maneuvering target tracking based on deep learning usually requires a large amount of data for training and optimization, but due to the irregular motion trajectory of the maneuvering target, the data in the track library does not completely cover the motion state of all the maneuvering targets.Therefore, the trajectory prediction state can easily deviate from the real trajectory when dealing with the long-range maneuvering target prediction problem.
Long Scale (long-distance prediction): long-term training, more prediction points, the advantage is that the overall prediction of the track is more accurate.
Short Scale (short-distance prediction): the number of prediction points is small, the speed of prediction is fast, and the advantage is that it can assist in determining the trajectory motion state in real time.
The accuracy of trajectory prediction is improved by quickly identifying changes in the maneuvering target state.To achieve this goal, a dual-scale track prediction network is designed, as shown in Figure 5.The input data consist of tracking measurements.For the long-scale network, a three-layer Bi-LSTM network structure is designed and trained for long-distance prediction.To prevent overfitting, a tanh activation layer and a dropout layer are added after each layer of the Bi-LSTM network.For the short-scale network, the number of Bi-LSTM layers is reduced, enabling faster response in short-distance prediction.The predicted values from both the long scale and short scale are used to reconstruct output tracks, resulting in new predicted tracks with higher accuracy.

Sliding Window Prediction Track Reconstruction
The prediction of an entire sequence of trajectories has the potential to accumulate prediction errors, thereby potentially diminishing the model's performance.To reduce this effect, a "moving window" approach is used in this paper [23].We define sliding window parameters as follows, the segment size is the trajectory duration T all representing the temporal length of each target's trajectory segment, the overlap size is the temporal overlap T overloap between trajectory segments T segment , and the num segments seg n is the total number of sliding trajectory segments.These parameters can be configured based on the motion characteristics of the target and the data acquisition frequency.The model is used to predict the output sequence at each time step, and the predicted output sequence is moved forward one time step in order to predict the output sequence at the next time step.The overlapping regions  x to reduce prediction error.This approach reduces the accumulation error and improves the performance of the model.Figure 6 shows the single-scale track sliding window prediction reconstruction.The sliding window size setting is a key problem when motion state changes.Sliding window size T s l i d refers to the length of window used for analysis in time series data.
With a sliding window, we can analyze and process continuous data.Time scale of target transition: the sliding window size should match the time scale of the target state transition.If the target state transitions quickly, the sliding window can be chosen to have a shorter length to capture these rapid changes.On the other hand, if the target state changes slowly, we can choose a longer sliding window to smooth the data and capture the long-term trend.

Sliding Window Prediction Track Reconstruction
The prediction of an entire sequence of trajectories has the potential to accumulate prediction errors, thereby potentially diminishing the model's performance.To reduce this effect, a "moving window" approach is used in this paper [23].We define sliding window parameters as follows, the segment size is the trajectory duration T all representing the temporal length of each target's trajectory segment, the overlap size is the temporal overlap T overloap between trajectory segments T segment , and the num segments n seg is the total number of sliding trajectory segments.These parameters can be configured based on the motion characteristics of the target and the data acquisition frequency.The model is used to predict the output sequence at each time step, and the predicted output sequence is moved forward one time step in order to predict the output sequence at the next time step.The overlapping regions x i n start : n overlope are combined with the average of the currently predicted overlapping regions x i n overlope : n seg to reduce prediction error.This approach reduces the accumulation error and improves the performance of the model.Figure 6 shows the single-scale track sliding window prediction reconstruction.
The sliding window size setting is a key problem when motion state changes.Sliding window size T slid refers to the length of window used for analysis in time series data.With a sliding window, we can analyze and process continuous data.Time scale of target transition: the sliding window size should match the time scale of the target state transition.If the target state transitions quickly, the sliding window can be chosen to have a shorter length to capture these rapid changes.On the other hand, if the target state changes slowly, we can choose a longer sliding window to smooth the data and capture the long-term trend.The Munkres algorithm [35] is used to find the best match between tracks ( , ),( , ),...,( , ) where  OSPA distance can be used to determine whether the middle segment of two tracks predicted by a dual-scale neural network deviates.Assuming that the set of predicted values for the middle segment of the dual-scale track is p and the set of values for the middle segment of the ground-truth track is p, the OSPA distance calculation formula is: The Munkres algorithm [35] is used to find the best match between tracks where u 1,2 k ∈ 1, 2, . . ., n p, v k ∈ 1, 2, . . ., m p , and k ≤ min(n p, m p ); (u 1,2 k , v k ) indicates that two tracks with different prediction scales match the first and middle points in the groundtruth track; and n p and m p , respectively, represent the number of data points in the two tracks.Set δ i,j to the dual-scale prediction of the distance between the first point u 1,2  k in the track and the point v k in the ground-truth track, that is, δ 1,2 i,j = d( p1,2 i , p j ).OSPA distance can be used to determine whether the middle segment of two tracks predicted by a dual-scale neural network deviates.Assuming that the set of predicted values for the middle segment of the dual-scale track is p and the set of values for the middle segment of the ground-truth track is p, the OSPA distance calculation formula is: where Π c represents all assignment schemes; N indicates the number of time steps in the middle period; pi and p χ(i) represent the predicted value of the dual-scale prediction at the i time step, respectively; c is the matching cost coefficient, which is used to weigh the number of assigned elements and the distance between them; and card(χ) represents the number of elements assigned χ in the assignment scheme.
to generate a set of trajectories for the target over a specified time period.

Preprocessing of Trajectory Data
For each maneuvering model, 30,000 trajectories were simulated, and three independent training datasets and three test datasets were created.For each training data set, we selected 80% of the trajectory randomly to train the proposed model and used the rest as a validation set.Dimensionlessness is necessary because different dimensions of the trajectories may have different units.Therefore, before implementation in the method, all the trajectory data are normalized using the min-max scaling as follows:

Neural Network Parameter Se ing and Performance Analysis
The DS-Bi-LSTM track prediction network is a deep learning model for processing track data with stronger spatiotemporal modeling capability and higher prediction accuracy.The initial learning rate is a hyperparameter that controls the step size for parameter updating each time.Se ing the initial learning rate is usually associated with a specific problem and data set.For larger datasets, a larger initial learning rate is typically chosen for faster parameter updates.A large learning rate can improve the convergence

Preprocessing of Trajectory Data
For each maneuvering model, 30,000 trajectories were simulated, and three independent training datasets and three test datasets were created.For each training data set, we selected 80% of the trajectory randomly to train the proposed model and used the rest as a validation set.Dimensionlessness is necessary because different dimensions of the trajectories may have different units.Therefore, before implementation in the method, all the trajectory data are normalized using the min-max scaling as follows:

Neural Network Parameter Setting and Performance Analysis
The DS-Bi-LSTM track prediction network is a deep learning model for processing track data with stronger spatiotemporal modeling capability and higher prediction accuracy.The initial learning rate is a hyperparameter that controls the step size for parameter updating each time.Setting the initial learning rate is usually associated with a specific problem and data set.For larger datasets, a larger initial learning rate is typically chosen for faster parameter updates.A large learning rate can improve the convergence speed and quickly explore the parameter space at the beginning of training, so we design the network at the long scale to have a large initial learning probability of 0.01 and the initial learning probability of the short scale of 0.001 due to the small training set.
In dual-scale neural networks, the number of neurons can impact the prediction performance.The number of neurons represents the complexity of the network.Larger numbers of neurons generally allow the network to have higher capacity and expressive power, enabling better fitting of larger and more complex training datasets.However, using too many neurons can result in several problems.It can increase training time, computational resources, and the risk of overfitting, thereby reducing prediction performance.It can be seen from Figure 8a that the number of neurons in the dual-scale network is set to 70 and 30, respectively.Epoch is also a crucial parameter in training neural networks, and increasing the number of epochs may potentially enhance the model's performance on the training data, as it provides the model with more opportunities to learn data features.However, if too many epochs are utilized, the model may start overfitting, leading to a decline in performance on the test data.According to Figure 8b, 75 epochs are selected for the short scale, while 225 epochs are chosen for the long scale.
numbers of neurons generally allow the network to have higher capacity and expressive power, enabling better fitting of larger and more complex training datasets.However, using too many neurons can result in several problems.It can increase training time, computational resources, and the risk of overfitting, thereby reducing prediction performance.It can be seen from Figure 8a that the number of neurons in the dual-scale network is set to 70 and 30, respectively.Epoch is also a crucial parameter in training neural networks, and increasing the number of epochs may potentially enhance the model's performance on the training data, as it provides the model with more opportunities to learn data features.However, if too many epochs are utilized, the model may start overfitting, leading to a decline in performance on the test data.According to Figure 8b, 75 epochs are selected for the short scale, while 225 epochs are chosen for the long scale.Table 2 shows the parameters of the DS-Bi-LSTM trajectory prediction network for long scale/short scale, respectively.To realize the training of track measurement values for different scales, the structural parameters of each scale network need to be adjusted accordingly.The purpose of short scale is to train quickly according to the abnormal prediction data and to detect the state transitions timely, so the parameter settings are suggested in Table 2. Table 3 shows the mean absolute percentage error (MAPE), mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and the coefficient Table 2 shows the parameters of the DS-Bi-LSTM trajectory prediction network for long scale/short scale, respectively.To realize the training of track measurement values for different scales, the structural parameters of each scale network need to be adjusted accordingly.The purpose of short scale is to train quickly according to the abnormal prediction data and to detect the state transitions timely, so the parameter settings are suggested in Table 2. Table 3 shows the mean absolute percentage error (MAPE), mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and the coefficient of determination (R-value) for different configurations of Bi-LSTM layers in the long-scale network.When the number of Bi-LSTM layers is increased from one to two, the network's output accuracy significantly improves.This means that adding an additional Bi-LSTM layer helps the network capture more complex patterns and features in the data, leading to better predictions.When the number of layers is further increased to three, the R-value is the largest.The R-value measures how well the predictions fit the actual data.A higher R-value indicates that the network fits the data better and has a stronger predictive power.However, when the number of layers is increased to four, the network becomes too complex and overfitting occurs.Overfitting happens when the model becomes too specialized in the training data and performs poorly on unseen data, such as the validation or test set.In this case, the prediction effect is reduced because the model is not able to generalize well to new data.Based on these observations, we set the number of Bi-LSTM layers to three.This choice strikes a balance between increasing model complexity to capture more patterns and avoiding overfitting.

Simulation Scenario Configuration
In this part, we determine the sliding window scale of the dual-scale network according to the motion state.It is an effective method to use multi-Bi-LSTMs to predict the trajectory of maneuvering targets and consider the influence of the target motion state on the scale of sliding windows.By adaptively adjusting the scale of the sliding window, the dynamic characteristics of the target in different motion states can be better captured.
The turning size of the maneuvering target can be determined by the turning angle, which refers to the angle at which the target changes course per unit time.The larger the turning angle, the greater the angle at which the target changes course per unit time, resulting in faster turns.
After the network parameters are determined, the prediction scale of the sliding window is also an important factor.The long-and short-scale sliding window settings are related to the motion parameters.For turning (fast turn, medium turn, slow turn), we set the minimum value of the sliding window according to the RMSE performance.The position RMSEs of different motion states, plotted against the sliding window scale, are shown in Figure 9a.In the same scale, the blue line represents the maximum error for rapid turns, while the red line corresponds to the minimum error for low-speed turns.When the turning rates are the same, solid lines represent the long scale, while dashed lines represent the short scale.As the turning rate increases, the error increases for the long scale, while the short scale exhibits good tracking performance for high turning rates.In Figure 9b, the plot shows the segmenting values and overlapping values.The yellow heat map indicates a high correlation coefficient, while the green heat map represents a low correlation coefficient.Based on linear correlation, appropriate values for window segmentation and overlap are selected to ensure optimal windowing performance.
Scenario 2: For trajectory prediction at different scales, in order to achieve overall an performance improvement, this paper proposes OSPA threshold discrimination for trajectory reconstruction.Based on the threshold value, the short-scale and long-scale predictions are combined to reconstruct the target's trajectory more accurately.
As the motion state of a maneuvering target changes, the real-time short-scale predictions of the trajectory state closely follow the actual trajectory.However, due to the limited size of the training dataset, short-scale predictions tend to be smoother and are In Figure 9b, the plot shows the segmenting values and overlapping values.The yellow heat map indicates a high correlation coefficient, while the green heat map represents a low correlation coefficient.Based on linear correlation, appropriate values for window segmentation and overlap are selected to ensure optimal windowing performance.
Scenario 2: For trajectory prediction at different scales, in order to achieve overall an performance improvement, this paper proposes OSPA threshold discrimination for trajectory reconstruction.Based on the threshold value, the short-scale and long-scale predictions are combined to reconstruct the target's trajectory more accurately.
As the motion state of a maneuvering target changes, the real-time short-scale predictions of the trajectory state closely follow the actual trajectory.However, due to the limited size of the training dataset, short-scale predictions tend to be smoother and are more susceptible to noise interference.On the other hand, long-scale predictions exhibit better performance when the trajectory is smooth and are less affected by noise interference.
When the dual-scale track error is within the range of the OSPA calculated threshold, the network selects the trajectory that is closer to the measurement point; otherwise, the network computes the average value of the two trajectories as the reconstruction trajectory.The results are shown in Figure 10.Table 4 shows the three target trajectory parameters.The performance advantage of the dual-scale neural network combined with GRMM is that it can improve the maneuvering target tracking performance of the uncertain model and simulate the real motion state according to the three common motion models of UAVs.The tracking model of the second maneuvering target is simulated as an aircraft performing a 50 s turn, followed by a 50 s turn, and another 50 s turn.The third target performs a 50 s CV, followed by a 50 s turn, and another 50 s CA.The corresponding results of the second and third targets are shown in Figures 12 and 13, respectively.From the tracking results of Figures 11-13, it can be seen that the DS-BiLSTM remains precise and stable for tracking all trajectories.Obviously, in these figures, all tracking RMSEs of our DS-BiLSTM algorithm provide the highest prediction accuracy in all simulation experiments compared to the other three algorithms.Specifically, the LSTM method relies on the original historical data set, so its prediction result fluctuates greatly when the motion state changes.Moreover, the IMM algorithm requires a known priori model transfer probability, the performance is greatly reduced if no prior probability is provided.Though the IMM has the known prior model transfer probability, the tracking state sometimes can be delayed, so the tracking effect is also affected.In summary, our DS-BiLSTM algorithm outperforms the state-of-the-art LSTM and IMM algorithms for tracking maneuvering targets.We have supplemented Figure 14 to illustrate drone tracking under different durations of uncertain maneuvers, providing an in-depth analysis of our algorithm.This addition aims to offer a more comprehensive evaluation of the model's robustness and effectiveness, particularly in addressing a broader range of scenarios with varying durations of maneuvers.Figures 11e, 12   curve, the curve of the algorithm proposed in this paper may be smoother, whereas the other algorithms may be more unstable when the noise changes.
These findings confirm the effectiveness of the proposed dual-scale neural network in mitigating the effects of noise and improving the accuracy of moving target motion distance prediction, as shown in Figure 15.The position RMSE of maneuvering targets increases with process and measurement noise and shows an unstable trend.The lowest region of the position RMSE of the folded plot represents that the algorithm performs better in that range under specific noise conditions.Observing the magnitude of change in the RMSE curve, the curve of the algorithm proposed in this paper may be smoother, whereas the other algorithms may be more unstable when the noise changes.Overall, the experimental results validate the robustness of the proposed dual-scale neural network in different noise environments and highlight its potential for enhancing the accuracy and reliability of motion distance prediction in practical applications.

Discussion
The high maneuverability and diverse trajectory modes of UAVs bring a few challenges to object tracking.The traditional filtering model heavily relies on priori parameters and therefore degrades significantly when the priori parameters dismatch the maneuvering target motion.To overcome these challenges, we use a neural network to determine the Markovian priori transfer probability to improve the accuracy of target motion model switching, and a dual-scale neural prediction network is proposed to solve the state delay problem stored in the interacting model.The proposed method improves the tracking performance of the maneuvering target and makes it suitable for agile UAVs.
First, a historical trajectory database is generated using data-driven and machinelearning techniques.This database is then utilized for training a model that predicts the switching behavior of the model under different environments and conditions.We also tackle the temporal prediction of maneuvering targets problem, which is influenced by the state transitions.Taking turn rate as an example, we design appropriate sliding window scales based on turning rate analysis, as shown in Figure 9.The proposed dual-scale maneuvering target state prediction algorithm optimizes the problem of prediction deviation caused by training set influence observed with single-scale predictions, as shown in Figure 10.Due to their sensitivity towards such transitions, single-scale predictions tend to deteriorate when there are different motion transitions of maneuvering targets.By employing OSPA distance for judgment, our dual-scale neural network selects a more accurate scale for trajectory reconstruction, enhancing the overall tracking performance of maneuvering targets and addressing issues faced by separate long and short-scale predictions, such as prediction deviation or poor tracking performance affected by noise.Figures 11-14 demonstrate tracking results for maneuvering targets with various motion patterns and illustrate that within the GRMM-CIF framework, our dual-scale neural network exhibits strong adaptability during target transition tracking with superior accuracy throughout the entire tracking period compared to traditional tracking algorithms.Figure 15 analyzes how different levels of process noise and measurement noise interference affect filtering when applying various algorithms to track the same maneuvering target.It is found that our proposed dual-scale neural network algorithm possesses more robust anti-interference capabilities.
Typically, process noise and measurement noise are unknown; however, in this paper, we assume that the noise is known and remains constant in order to validate the algorithm performance.In future investigations, we will consider closer to the practical detection environments, such as time-varying noise and detection target loss.Additionally, we will explore the possibility of expanding the proposed approach to multi-station UAV swarm target tracking.

Conclusions
To summarize the above, we present a novel hybrid-driven multi-model discrete-time system filter for tracking maneuvering targets, such as UAVs.This filter leverages the advantages of the underlying system knowledge obtained from big data and the domainspecific expertise in target dynamics.By synergistically integrating these two sources of information, we aim to enhance the accuracy and efficiency of target tracking.
The GRMM-CIF filtering architecture is established to filter and track the measured values of the target using multiple motion models, effectively addressing the challenge of modeling uncertain target motion.By avoiding data dependency in the neural network when the motion state changes, the method improves the accuracy of tracking.In trajectory reconstruction, the model can choose a sliding window of appropriate length to capture motion information.The performance advantage of the dual-scale neural network combined with GRMM is that it can improve the maneuvering target tracking performance of the uncertain model.
The DS-BiLSTM algorithm is devised to tackle the prediction delay issue arising from target state changes under multiple models.This novel algorithm facilitates swift assessment of target motion amidst variations of the motion state of maneuvering UAVs, thereby ensuring timely and precise predictions.The dual-scale network consistently outperforms the other algorithms with robustness, showing fewer prediction errors and better tracking ability at various noise levels.We confirm the effectiveness of the proposed dual-scale neural network in mitigating the effects of noise and improving the accuracy of motion distance prediction for maneuvering targets.
The results presented herein exemplify the algorithm's exceptional performance with regard to tracking accuracy, robustness, and adaptability across varying environmental conditions.Furthermore, when compared to classical target tracking algorithms, our algorithm exhibits faster response in perceiving maneuvers and state transitions, thereby significantly reducing peak tracking errors.In future research, we might explore extending the proposed algorithm's translation into a multi-station fusion structure to further enhance its tracking performance.
ζ k ] T is the corresponding velocity of target, and [ ..

Figure 2 .
Figure 2. Diagram of multi-model maneuvering target tracking based on dual-scale deep learning.3.1.Based on GRMM-CIF Maneuvering Target Multi-Model Tracking.

Figure 2 .
Figure 2. Diagram of multi-model maneuvering target tracking based on dual-scale deep learning.

Figure 5 .
Figure 5. Neural network structure for maneuvering target trajectory prediction.

Figure 8 .
Figure 8.The influence of different parameters on dual-scale neural networks.(a) Number of hidden units in the Bi-LSTM layer.(b) Epoch.

Figure 8 .
Figure 8.The influence of different parameters on dual-scale neural networks.(a) Number of hidden units in the Bi-LSTM layer.(b) Epoch.

Figure 9 .
Figure 9. Scale sliding windows are set due to turning rate.(a) Dual-scale prediction performance.(b) Correlation coefficient heatmap between segment size values and overlap values.

Figure 9 .
Figure 9. Scale sliding windows are set due to turning rate.(a) Dual-scale prediction performance.(b) Correlation coefficient heatmap between segment size values and overlap values.

Figure 10 .
Figure 10.Real and reconstructed trajectories of the maneuvering target.(a) The whole reconstructed trajectory.(b) Enlarged true and predicted trajectories at position 1.(c) Enlarged true and predicted trajectories at Position 2. (d) Enlarged true and predicted trajectories at Position 3.

Figure 10 .
Figure 10.Real and reconstructed trajectories of the maneuvering target.(a) The whole reconstructed trajectory.(b) Enlarged true and predicted trajectories at position 1.(c) Enlarged true and predicted trajectories at Position 2. (d) Enlarged true and predicted trajectories at Position 3.

ϖ 3 = 3 . 5 • /s 3 [
1100 m, 800 m, 500 m, 10 m/s, 6 m/s, 1 m/s] 50 s, CV mode 50 s, CT mode, ϖ 2 = −4.5 • /s 50 s, CA mode, a = [6 5 3] m/s 2For different tracking algorithms, Figure11presents the tracking comparison results of the first maneuvering target.The tracking model of the first maneuvering target is simulated as an aircraft performing a 50 s turn, followed by a 50 s constant speed segment, and another 50 s turn.The estimated trajectories of the four algorithms of this experiment are shown in Figure11a-d.Figure11a-d also show the true trajectory and corresponding measurements.Then, the transition probabilities of the maneuvering target motion states of our proposed method are shown in Figure11e.Finally, the prediction accuracy in terms of RMSE for the position of the four algorithms is shown in Figure11f.

Figure 11 .
Figure 11.Tracking performance comparison of four algorithms for the first maneuvering target.(a) True and predicted trajectories in 3D Cartesian coordinate system.(b) True and predicted trajectories in X and Y directions.(c) True and predicted trajectories in X and Z directions.(d) True and predicted trajectories in Y and Z directions.(e) Transition probabilities of the maneuvering target motion states.(f) Position RMSE.

Figure 11 .
Figure 11.Tracking performance comparison of four algorithms for the first maneuvering target.(a) True and predicted trajectories in 3D Cartesian coordinate system.(b) True and predicted trajectories in X and Y directions.(c) True and predicted trajectories in X and Z directions.(d) True and predicted trajectories in Y and Z directions.(e) Transition probabilities of the maneuvering target motion states.(f) Position RMSE.

Figure 11 .Figure 12 .Figure 13 .
Figure 11.Tracking performance comparison of four algorithms for the first maneuvering target.(a) True and predicted trajectories in 3D Cartesian coordinate system.(b) True and predicted trajectories in X and Y directions.(c) True and predicted trajectories in X and Z directions.(d) True and predicted trajectories in Y and Z directions.(e) Transition probabilities of the maneuvering target motion states.(f) Position RMSE.

Figure 12 .Figure 11 .Figure 12 .Figure 13 .
Figure 12.Tracking performance comparison of four algorithms for the second maneuvering target.(a) True and predicted trajectories in 3D Cartesian coordinate system.(b) Transition probabilities of the maneuvering target motion states.(c) Position RMSE.

Figure 13 .
Figure 13.Tracking performance comparison of four algorithms for the third maneuvering target.(a) True and predicted trajectories in 3D Cartesian coordinate system.(b) Transition probabilities of the maneuvering target motion states.(c) Position RMSE.

Figure 15 .
Figure 15.RMSE performance comparison of four filtering algorithms with various noise levels.

Figure 15 .
Figure 15.RMSE performance comparison of four filtering algorithms with various noise levels.
, azimuth angle  k , and pitch angle  k measured by the radar to the target.H k is the nonlinear 3D range measurement function.
k r , Doppler velocity k v

Mode 1 CIF Mode 2 Mode probability update ＆ mixing probability calculatioon State estimate and covariance combination CIF Mode 3
and k o( ) represent input gate, forge ing gate, feature extraction, and output gate, respectively.x ˆk is represented as the input at moment k

Table 3 .
Performance of network structure with different numbers of layers for trajectory prediction.