BOTNet: Deep Learning-Based Bearings-Only Tracking Using Multiple Passive Sensors

Bearings-only target tracking is commonly used in many fields, like air or sea traffic monitoring, tracking a member in a formation, and military applications. When tracking with synchronous passive multisensor systems, each sensor provides a line-of-sight measurement. They are plugged into an iterative least squares algorithm to estimate the unknown target position vector. Instead of using iterative least squares, this paper presents a deep-learning based framework for the bearing-only target tracking process, applicable for any bearings-only target tracking task. As a data-driven method, the proposed deep-learning framework offers several advantages over the traditional iterative least squares. To demonstrate the proposed approach, a scenario of tracking an autonomous underwater vehicle approaching an underwater docking station is considered. There, several passive sensors are mounted near a docking station to enable accurate localization of an approaching autonomous underwater vehicle. Simulation results show the proposed framework obtains better accuracy compared to the iterative least squares algorithm.


Introduction
Bearings-only target tracking is commonly used in several domains, such as air [1,2], space [3], and underwater [4]. In this method, passive sensors provide line-of-sight (LOS) measurements from the sensor to the target. The LOS is a function of sensor-target azimuth and elevation angles. Those angles are nonlinear functions of the target position in Cartesian coordinates and the sensor positon. Thus, with a single LOS measurement, localization of a target cannot be achieved since each LOS has two measured angles and the number of unknown parameters is the three-dimensional target position vector. As a consequence, several LOS measurements are required for the localization process. To that end, two options are available: (1) using a single moving passive sensor located on the vehicle [5,6] and (2) an array of sensors pre-placed in a fixed and known position on the seabed [7] or on the ground [8].
Considering the single moving passive sensor option, commonly the target movement is modeled by a constant velocity model, which is then used as a system model in a Kalman framework or any other nonlinear filter [9,10].
One popular method to solve a set of LOS measurements from multiple sensors in a single epoch is the iterative least squares (ILS) algorithm [11]. In general, for a linear scenario, a variety of least squares algorithms may be employed, such as total least squares [12] and generalized least squares [13]. For a nonlinear problem, linearization is applied to convert the nonlinear setup into an iterative algorithm to process the measurements. In the ILS algorithm, given initial conditions, iterations are repeated until a predefined error threshold is reached [10]. ILS can be used for several target tracking related scenarios, like a single moving target tracking with synchronous [8] or asynchronous [14] LOS measurements collected from several passive sensors or multi-target tracking [15]. There, several moving robots along with robots observations, target positions, and velocities are modeled as a graph, which is later modeled and solved as a nonlinear least squares problem. In addition to target tracking, ILS is used also in 3D model compression [16]. Using ILS gives more compact representation while maintaining objects' geometries and topologies. ILS is also used in image denoising [17] where the noisy image passes through a wavelet filter and then least squares regularization is applied to make the smoothed image similar to the noisy image.
Focusing on the underwater environments, one of the most challenging localization problems concerns the autonomous underwater vehicle (AUV) docking scenario. AUV docking onto a predefined location can be used to transfer data or charge the AUV without pulling it out of the sea. While several works suggest localization using vision-based algorithms, the localization solution assumes the docking panel's shape is known [18], and in several approaches, lamps are located at the entrance of the docking station to provide clearer images [19]. The performance of vision-based approaches depends on water turbidity and visibility. In other solutions, multiple sensors, both active and passive, are combined, thereby returning more accurate position estimation [20,21]. A combination of vision and non-vision based solutions exists, as well [22]. This paper presents a deep-learning based framework for the bearing-only target tracking process, instead of ILS, named BOTNet: bearing-only tracking network. BOTNet is applicable for any bearings-only target tracking task, regardless of the operating target environment. As a data-driven method, the proposed deep-learning framework offers several advantages over the traditional iterative least squares. To begin with, it is a data-driven approach that implicitly learns the data distribution, then generalizes and predicts the unknown target position based on LOS measurements. In addition, the framework can be focused to learn limited or specific scenario parameters to improve its performance, a feature not available in traditional ILS. Moreover, no initialization process or initial guess is required in contrast to the traditional ILS algorithm. Yet, the proposed network operates with a defined specific input dimension. This translates into a fixed number of LOS measurements. Thus, a different network is required to match each number of expected LOS measurements, unlike the traditional ILS which can operate without any limitations on the number of LOS measurements.
To demonstrate the proposed approach, a scenario of tracking an AUV approaching an underwater docking station is considered. There, several passive sensors (e.g., cameras) are placed near a docking station to enable accurate localization of an approaching autonomous underwater vehicle.
The rest of the paper is organized as follows: Section 2 presents the problem formulation of target location estimation using LOS measurements, Section 3 represents the deep learning based approaches, and Section 4 compares the ILS method and the deep learning based approach, for several test cases. Section 5 gives the conclusions.

Problem Formulation
To formulate the ILS problem, a scenario of n passive sensors measuring LOS measurements to a target with unknown position is considered. Two options for the sensor positions may be considered: (1) all sensor position vectors in some global Cartesian coordinates frame are known, and (2) all sensor position vectors in some local Cartesian coordinates frame are known, for example, relative to one of the sensor positions. In the former, the target position will be determined in the global frame, while, in the latter, in the local coordinate frame. Regardless of the coordinate frame choice, the mathematical formulation is identical. Therefore, it is assumed that all sensor position vectors are known.
Let n be the number of sensors, each located in a fixed position in Cartesian coordinates as where p i is the position vector of sensor i. Each sensor provides one LOS measurement of the (unknown) target position, p t , represented in Cartesian coordinates as The LOS measurement from sensor i is a function of the sensor location and the unknown target location: where v i is a random noise modeled as a zero mean white Gaussian noise. The nonlinear function in (3) is defined as where α i is the azimuth angle, and i is the elevation angle of sensor i. The formation of the composite measurement (holding all sensor LOS measurements) is made using a maximum likelihood (ML) criterion by maximizing the likelihood function of the target state at the fusion time t f , based on the set of measurements M, which is the probability density function of the measurement set conditioned on x(t f ); namely where Λ(x) is the likelihood function of the unknown parameter vector to be estimated and is the negative log-likelihood function [23]. R is the measurement-noise covariance matrix: where it is assumed all measurements have equal variance. Thus, the ML problem becomes a nonlinear least squares problem, and, consequently, the ML estimate can be obtained using the iterated least squares (ILS) algorithm [10]. To that end, let M be a vector consisting of all measurements arriving from n sensors at the same epoch: The composite measurement (10) is a function of the unknown target position (5). The ILS algorithm, which maximizes (7), yields after the jth iteration the estimate of the target state using [10] x j+1 where R is defined in (9). The Jacobian matrix is: where H i corresponding to the measurement i is The partial derivatives in (13) are where || · || is the Euclidean norm. Finally, The initial estimatex 0 ILS is required for the ILS method and requires the positions of two sensors' locations and their compatible LOS measurements [24]: where [x 1 , y 1 , z 1 ] T is the sensor #1 location in the local Cartesian coordinates, [α 1 , 1 ] T are the measured azimuth and elevation angles from sensor #1, [x 2 , y 2 , z 2 ] T is the sensor #2 location in the local Cartesian coordinates, and [α 2 , 2 ] T are the measured azimuth and elevation angles from sensor #2.

Deep Learning ILS
As described in Section 2, ILS is a common method for parameter estimation based on some model connecting between the measurements and the unknown vector required to be estimated. Linearization is applied to convert the nonlinear problem into an iterative algorithm to process the measurements. To improve the accuracy, the procedure is repeated iteratively with the result from one iteration is used as the linearization point of the next iteration. Hence, the number of iterations influences the accuracy of the solution [11,25]. In the proposed approach, a deep learning framework is set to replace the traditional ILS algorithm. A block diagram showing the traditional ILS and the proposed approach is presented in Figure 1.
The proposed deep learning framework has several benefits: • The data-driven solution implicitly learns the train data probability distribution to generalize and predict the values of the required unknown parameters based on the given measurements. • Can be focused to learn specific scenario parameters to improve its accuracy. • Does not require initialization to start its operation. • Deep networks have an inherent property of denoising the data [26] from the noisy measurements, which could be utilized to improve performance.
On the other hand, a deep learning architecture operates with a defined specific input dimension. Here, this translates into a fixed number of LOS measurements. Thus, a different network is required for each number of expected LOS measurements. A typical general scenario relevant for the proposed approach consists of: • An array of passive sensors, each located in a fixed position. • A moving target.
In this paper, an AUV docking scenario is considered; thus, the passive sensors are located near a docking station, and the moving target is the AUV. Full details of the scenario are provided in Section 4.1. When constructing and training a deep neural network, the goal is to find an approxi mation to a function connecting the input to the network (LOS measurements) to its output (target position vector). The approximation is in the form of a deep neural network with specific structure and parameters. Full details of our specific approximation, i.e., BOTNet are described later on in this section.
BOTNet is constructed such that its output is the estimated target position vector p t . The input consists of the sensor locations (in local Cartesian coordinates) and their corresponding LOS measurements. To that end, the measured azimuth and elevation angles are used to construct the LOS vector from sensor i: BOTNet structure comprises of fully connected layers. Fully connected networks, also commonly known as deep feedforward networks or feedforward neural networks, are the most basic form of deep neural networks. They assume no relation between the input entries; therefore, they can fit any type of problem.
The fully connected layers are arranged in a chain-like structure such that each layer's input is the output of the previous layer. In such a layer, each node, also named neuron, gets as input the output from all the neurons in the previous layer. The first layer is defined by [27]: where x is the input sample, W is the weight matrix, b is the bias vector, and g is the activation function. The weights and biases create a mapping between the neurons in the current layer and the neurons from the previous layer. Their exact values are computed during the network optimization process such that the loss function is minimized. The i-th layer is defined by: where h (i−1) is the output of the previous layer. Each layer uses the Rectified linear unit (ReLU) as a nonlinear activation function. ReLU is defined as [28]: where z is the input to the activation function. The detailed architecture is described in Figure 2. The number of layers in a specific feedforward neural network can vary, as well as the number of neurons in each layer. These values, among others, are named hyper-parameter values. The specific values for BOTNet are described in Figure 2 and, later on, in this section. The desired network approximation function should be the closest possible to the real function relating the input and output to the network. This is done by minimizing a loss function associated with the problem at hand. The root mean square error (RMSE) was chosen for the loss function since a regression problem is addressed. The input to the loss function is the estimated target location for sample i,p (i) t and the actual target location for data sample i, p t (i) , each is a vector of dimension d = 3. The loss function is computed on each data sample using: whereP t is the set of n estimated target locations, and P t is the set of the corresponding n actual target locations. During the training stage of BOTNet, the actual target position p t is given to the network, and BOTNet is optimized to find a set of parameters W, b, such that the loss function is minimized. Since our input dimensionality is considered small, this network's training process can be fast enough. Hyper-parameter values were optimized using the Bayesian optimization tool [29]. Training consisted of 1000 epochs with a learning rate of 0.0001 and a batch size of 100. During testing, BOTNet has to return an estimate of the target position vector for each sample. This output is compared with the actual target position p t , and then metrics, such as RMSE, are computed to examine the network's ability to estimate the target location for unseen data samples. During this work, several other architectures were examined, but only the best one (in terms of minimum error) is described herein. This is because the goal of this paper is to derive the BOTNet framework, rather than finding the optimal network. This task is left for future work.

Analysis and Results
Several scenarios are defined to evaluate the performance of the proposed framework with BOTNet and to compare its performance relative to the traditional ILS. Those scenarios are

•
The influence of different types of input, which is the representation of the measurements and sensor locations, to the network, is addressed in Section 4.2. • In Section 4.3, the noise level impact on the localization process is addressed. • Different test datasets, created within the grid boundaries are used to evaluate the network performance are presented in Section 4.4. • BOTNet's ability to generalize unseen data from a test dataset with different distribution, created outside the grid boundaries, is examined in Section 4.5.

•
The influence of varying number of sensors, each providing a single LOS measurement, is addressed in Section 4.6.
The performance measure used to evaluate ILS and BOTNet in all of the above scenarios is RMSE (26).

Scenario and Dataset
A numerical simulation was used to generate the dataset and to perform the comparison to traditional ILS. This resulted in 2800 different AUV locations in the grid. To that end, the following parameters and assumptions are made: • Number of sensors, denoted by n, is equal to 3, each providing a single LOS measurement. • Sensor locations are fixed on an horizontal plane, forming an equilateral shape with 50 m edge length, as shown in Figure 3. The true LOS were calculated using plane geometry. To construct the measured LOS, a zero mean white Gaussian noise with a standard deviation of 0.01 rad was added.
The dataset was created as follows: 1.
The AUV location was picked from one of the possible locations described in Section 4.1.

2.
The LOS measurements were generated according to the chosen AUV location and sensor locations (fixed position).
For each given AUV location, the true LOS vector was calculated followed by ten different realizations of the corresponding LOS measurement, resulting in a total of 28,000 LOS measurements in the defined grid. An example of three sensor locations (green) and an AUV location (red) can be seen in Figure 3. The dataset was split to train and test sets according to the following proportions: 80% and 20% of samples in each set, respectively. Unless stated otherwise, splitting into training and testing sets is performed randomly.

Input Data Normalization
For the localization task at hand, an accurate representation of the data is crucial to prevent localization errors. The basic option to define the input is using the measured azimuth and elevation angles from each sensor. In that manner, the range of the angles could be [0, 2π] or [−π, π]. Instead, in this paper the LOS vector measurement (22) is used. This ensures that all values will be ∈ [−1, 1].
As the input to the network also includes the known sensor positions, two options were examined. In the first one, the sensor locations were not normalized, while, in the second option, they were normalized to the range of [−1, 1].
BOTNet with both input types (not normalized and normalized sensor positions) and the ILS algorithm were evaluated on the test dataset. RMSE results are presented in Table 1. As can be seen, both approaches improved ILS performance; in particular, when using normalized sensor locations, an improvement of approximately 40% was achieved. This result shows that, for the given scenario parameters, the proposed BOTNet outperformed the traditional ILS approach.

Noise Sensitivity
To check the effect of the noise magnitude σ on the target position error of both ILS and BOTNet, several datasets were created in the same fashion as described in Section 4.1, only with different noise levels. The STD values are in the range of σ ∈ [0.002, 0.02] radians with a step of 0.002 radians between successive values. In total, 10 datasets were created and evaluated. As can be seen in Figure 4, BOTNet outperformed the ILS method for all

Dataset Influence
Several options for splitting the dataset within the original grid and its influence on BOTNet's performance are examined. The motivation behind this analysis is to check BOTNet's ability to estimate the target's position vector given unseen data, which might have different properties in comparison to samples given to the network during training. To that end, the following cases are considered for the test dataset:

1.
Case A : Random split using the grid of possible AUV locations described in Section 4.1.

2.
Case B: Random split as in Case A, limited in that all measurements taken from a certain location must appear in the training or testing sets. That is, the test dataset includes AUV locations not present in the training dataset. Case B is visualized in Figure 5, showing possible AUV positions for the train and test datasets in a two-dimensional grid.

3.
Case C: Grid offset. This dataset is created using the same guidelines described in Section 4.1 but adding an offset of two meters in the x and y axes for each possible AUV location in the grid.

4.
Case D: Random grid. In all previous cases, the grid was chosen using some conditions in a deterministic manner. Here, the possible AUV location is chosen randomly within the grid boundaries as defined in Section 4.1. To that end, in each cell of the original grid, an AUV location is randomly selected.
Cases C and D are visualized in Figure 6. In each of the described scenarios, BOTNet outperformed ILS, as shown in Figure 7. The largest difference between BOTNet and ILS is 47%, achieved in Case A, and the smallest of 9% was in Case B. For Cases C and D, the difference is 20% and 32%, respectively. This shows that, unlike ILS, BOTNet can generalize and return better AUV location estimations as long as they are within the grid present in the training dataset.

Network Robustness
In this case, the AUV possible locations in the test dataset were created outside the boundaries of the original grid (described in Section 4.1). To that end, the AUV locations were selected by moving it outward from the grid in the following manner: a single face of the three-dimensional rectangular cuboid grid is chosen. From this face, the AUV position was moved using a constant baseline. This procedure was repeated for six faces. For the x and y axes, the baseline was 40 m, while, for the z axis, the baseline was 25 m.
All six possibilities were examined, with three axes moving in the positive or negative directions. This procedure moved the possible AUV locations to a new grid as follows: x and y axes [80,156] meters in the positive direction and [−80,−4] meters in the negative one; for z axis, [60,85] meters in the positive direction, and [−40,−15] meters in the negative one. There, the grid step size is four meters for the x and y axes, and five for the z axis.
For each possible AUV location in this grid, 100 measurements of LOS were simulated to create the test dataset.
Results for the positive y (x axis is identical to y axis) and z axes are shown in Figures 8 and 9. The results when the AUV possible locations are in the negative directions of x, y, and z are not presented as the behavior is identical to those of the positive directions.
In all six possibilities, the ILS approach outperformed BOTNet while maintaining the same level of accuracy as achieved when the AUV possible locations were within the grid. On the other hand, BOTNet performance degraded as the AUV position moved outward from the orignal grid.
This result is not surprising since deep neural networks usually return large errors when training and test sets represent data from different distributions, as in this case. Thus, to overcome such situations, the training grid should be extended to include all possible AUV locations.

Influence of Number of Sensors
BOTNet is evaluated here against different datasets, each of which is created for a given number of sensors N ∈ [3,10], with each contributing a single LOS measurement.
For each possible number of sensors N, sensor locations were set to be the nodes of each polygon, in the horizontal plane, with N edges, as illustrated in Figure 10.
from the original grid in the positive z direction.

Influence of Number of Sensors
BOTNet is evaluated here against different datasets, each of which is created for a given number of sensors N ∈ [3,10], with each contributing a single LOS measurement. For each possible number of sensors N, sensor locations were set to be the nodes of each polygon, in the horizontal plane, with N edges, as illustrated in Figure 10.

Conclusions
In this paper, an alternative method for estimating an unknown target location from noisy measurements arriving from a multisensor passive system was suggested instead of the ILS approach.
The proposed framework, BOTNet, uses a deep neural network created for the localization task. The network receives the known sensor locations and their LOS measurements and outputs the estimated target position vector.
Using the proposed framework has several advantages over the ILS approach. First, it is based on a deep learning algorithm, which is a data-driven approach. Therefore, it implicitly learns the data's distribution and can learn specific parameters relevant to a certain scenario. In addition, deep networks enjoy an inherent property of denoising the noisy measured LOS, which improves the accuracy of the proposed approach. Finally, in contrast with ILS, no initial guess or initialization process is required. On the other hand, a data-driven approach requires a preliminary stage of collecting or constructing the dataset and then training the network. In addition, the input to the network or ILS is the sensors' known locations and their LOS measurements. While, for the ILS, the number of sensors can vary during operation, it must remain fixed for BOTNet as the network is trained for a specific input dimension.
To demonstrate the benefits of the proposed approach, an AUV docking scenario was considered. There, passive sensors are placed near the docking station to provide LOS measurements to the approaching AUV.
Several cases were examined to evaluate BOTNet. To that end, a three-dimensional rectangular cuboid grid representing possible AUV locations near the docking station was created. In all examined cases, BOTNet achieved a lower RMSE error compared with ILS, except for the scenario when the AUV locations were taken out of the grid. The latter can be circumvented by defining a larger grid for the training dataset. Although the paper focuses on an AUV localization task, BOTNet can be easily adapted for other bearing-only tracking scenarios, such as tracking air platforms or tracking a member in a formation, or other domains that currently use an iterative least squares based approach, like 3D model compression.
The implementation of the proposed framework is available at https://github.com/ Hadar-Sha/BOTNet, accessed on 31 May 2021.
Author Contributions: H.S. and I.K. contributed equally to the paper except for the simulation and visualization that were made by H.S. All authors have read and agreed to the published version of the manuscript.