Next Article in Journal
Design Optimization for Acoustic Noise Reduction in Single-Phase Induction Motors: Effects of Capacitor Selection, Winding Configuration, and Rotor Eccentricity with Experimental Validation
Previous Article in Journal
Integrated Optimization Framework for AS/RS: Coupling Storage Allocation, Collaborative Scheduling, and Path Planning via Hybrid Meta-Heuristics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Observability and Information Bounds in UUV Relative Navigation from Range-Rate

Faculty of Navigation and Naval Weapon, Polish Naval Academy, 81-127 Gdynia, Poland
Appl. Sci. 2026, 16(8), 3758; https://doi.org/10.3390/app16083758
Submission received: 29 January 2026 / Revised: 9 April 2026 / Accepted: 9 April 2026 / Published: 11 April 2026

Abstract

In this paper, we investigate the relative navigation of two underwater vehicles in a leader–follower configuration when the only available inter-vehicle acoustic measurement is Doppler-derived range-rate, i.e., the rate of change in range, with no direct range measurement. We show that, in this setting, estimation performance depends critically on motion geometry: under unfavorable configurations and overly “radial” relative motion, some uncertainty components cannot be effectively reduced, and the available information decays rapidly as the separation increases. We propose a practical, quantitative approach to assessing these effects over time, based on information measures computed in a sliding time window and the corresponding theoretical accuracy bounds. Building on this, we construct information maps for representative maneuvers that highlight regions of “good” and “poor” geometry and explain when and why the estimator loses stability. We complement Monte Carlo simulation results with a reinforcement learning experiment in which a control policy learns to both maintain the formation and generate maneuvers that improve estimation conditions in the Doppler-only regime. The results demonstrate that motion control explicitly accounting for trajectory informativeness can significantly increase task success compared with control strategies that ignore these limitations.

1. Introduction

Unmanned/autonomous underwater vehicles (UUVs/AUVs) are undertaking increasingly complex inspection, observation, survey, and cooperative missions in environments where Global Navigation Satellite System (GNSS) signals are unavailable and communication and sensing are constrained by acoustic propagation. In practice, underwater navigation is typically based on dead reckoning, for example, an inertial navigation system (INS) aided by a Doppler velocity log (DVL) and heading sensors [1]. A DVL is an onboard sensor used to estimate the vehicle’s velocity relative to the seabed (or, in some modes, relative to the surrounding water), whereas the observable studied in this paper is an inter-vehicle acoustic Doppler-derived range-rate along the line of sight between two platforms. However, uncertainty grows with dive duration due to drift, bias accumulation, and model imperfections. Broad surveys of AUV navigation and localization summarize the main sensor configurations—including INS/DVL integration, acoustic systems, and simultaneous localization and mapping (SLAM)—as well as the hardware–operational trade-offs that determine the choice of sensors and algorithms depending on the mission scenario [2,3,4,5]. In many practical missions, the key quantity is not the absolute position in a global reference frame, but the relative position between platforms. Typical examples include formation keeping, cooperative mapping, docking, navigation assistance, and leader–follower operation. For such tasks, the underwater cooperative navigation literature shows that useful relative estimates can be obtained even with limited inter-vehicle acoustic information, provided that the motion geometry and the estimation architecture are favorable [6,7,8]. Related cooperative formulations have also explored alternative geometries and estimation frameworks, including tetrahedral multi-AUV configurations and moving-horizon estimation [9,10]. At the same time, classical acoustic aiding systems such as long baseline (LBL), short baseline (SBL), and ultra-short baseline (USBL) require external infrastructure, deployment effort, and operational constraints.
Consequently, a substantial body of underwater navigation research has moved toward infrastructure-reduced solutions, including localization using a single beacon, range-only/single-beacon schemes, pseudo-LBL methods, and one-way travel-time (OWTT) navigation [11,12,13,14,15,16,17,18]. In parallel, a related research line has examined observability in relative/range-based underwater localization. Examples include analyses of relative localization based on ranging and depth measurements, observability studies for single-range cooperative underwater vehicles, and observability-aware trajectory design for range-only/single-beacon settings [19,20,21,22,23,24,25,26]. These works are especially relevant here because they show that, in underwater localization, estimation quality may depend as much on motion geometry as on the sensing modality itself.
In this work, we focus on UUV relative navigation in a leader–follower setting in which the only inter-vehicle acoustic observation is range-rate information obtained from Doppler shift (Doppler-only). Such a measurement can be practically attractive in situations where the time-of-flight (ToF) range is unavailable or unreliable in the navigation stack, or where exploiting ToF would require additional system assumptions that are not met in the considered scenario. In particular, OWTT ranging typically requires accurate clock synchronization (or very accurate time stamping) between vehicles; maintaining such synchronization underwater can be costly and may be difficult due to clock drift, intermittent communication, and latency [16,17,18]. Indicatively, Eustice et al. report commercially available oscillators suitable for AUV applications with typical drift levels around 20 ns/s for low-power TXCO units and 2.8 ns/s for an OCXO, and note that for dives exceeding about 14 h, the drift of the TXCOs used in their OWTT system can approach 1 ms and thereby introduce approximately 1.5 m of pseudo-range bias if left uncompensated [17]. In this sense, our point is not that OWTT is generally unsuitable, but rather that in the present scenario, we do not assume the synchronization quality, timestamp accuracy, or ToF reliability required to use it as a dependable navigation observable. While two-way travel time (TWTT) can relax strict synchronization, it requires a request–response handshake and introduces additional protocol overhead and turnaround delays, which may be incompatible with low-duty-cycle links or formation-control update rates. Moreover, even if raw ToF observables exist at the modem level, they may not be exposed to the user or may be too biased or variable—for example, due to multipath, imperfect detection, variable processing delays, sound-speed mismatch, or synchronization errors—to serve as a dependable range input [3,18,27,28]. By contrast, Doppler is available as relative-motion information in many acoustic architectures.
On the other hand, Doppler-only is a one-dimensional measurement: it informs about the rate of closing/opening, but it does not provide the range directly. As a result, the quality of relative-state estimation becomes strongly dependent on motion geometry and may degenerate in typical situations of overly radial motion or insufficient variation of geometry over time. Doppler-based underwater localization has been studied from several viewpoints, including joint localization and time synchronization, AUV-aided localization based on Doppler shift, and the fundamental limits of Doppler-, time-of-arrival (ToA)-, and time-difference-of-arrival (TDoA)-based localization [27,28,29,30,31]. However, the specific case considered in this paper—leader–follower relative navigation in which the only inter-vehicle acoustic observable is Doppler-derived range-rate and no direct range measurement is available—remains insufficiently characterized from the viewpoint of finite-horizon information content, geometry-induced degeneracies, and motion-design implications.
This leads to the central problem of this paper: in Doppler-only, observability and practical identifiability of the relative position are not fixed properties of the sensor alone, but rather depend on the trajectory and maneuvers over a finite time horizon. In leader–follower tasks, an additional objective conflict emerges: on the one hand, the follower should stabilize the desired geometric relationship and maintain formation; on the other hand, it must periodically excite the measurement geometry in order to avoid weakly informative states. Similar geometric degeneracies and the need for excitation are well known in range-only and single-range problems, where classes of motions leading to poor observability have been identified and trajectory-planning or control methods have been proposed to improve estimation conditions [19,20,21,22,23,24,25,26]. In this context, a natural question becomes: How can the information limitations of Doppler-only be described and visualized as a function of geometry and maneuver type, and how can motion be designed to mitigate these limitations without losing the ability to maintain formation?
Recent cooperative underwater navigation studies have also broadened the design space by considering dynamic-process-model-based acoustic navigation without a DVL, anchor-free multi-AUV cooperative localization, and current-aided cooperative localization and tracking [32,33,34]. These works confirm the practical importance of estimator-aware motion and alternative information sources in underwater cooperation. At the same time, they differ from the present paper in a fundamental way: they rely on range, bearing, current-field information, or richer cooperative sensing structures, whereas our focus is on the more restrictive Doppler-only range-rate case without direct range and without auxiliary absolute references.
The research problem addressed in this paper is how to quantify and mitigate the geometry-induced loss of information that arises in leader–follower relative navigation when the only inter-vehicle acoustic measurement is Doppler-derived range-rate. The practical implication is important: a controller may appear successful from the formation-keeping viewpoint while simultaneously driving the estimator into a weakly informative regime, for example, when the relative motion becomes too radial or too closely matched to the leader. Accordingly, this paper addresses two concrete questions:
  • (Q1) How does motion geometry affect finite-horizon informativeness and theoretical accuracy bounds in Doppler-only relative navigation?
  • (Q2) Can motion strategies that explicitly preserve informative geometry improve leader–follower task performance relative to geometry-agnostic formation control?
Our main hypothesis is that finite-horizon estimation performance in Doppler-only leader–follower navigation is governed primarily by (i) the transverse component of the relative motion and (ii) the inter-vehicle separation. More specifically, maneuvers that maintain sufficiently strong and directionally diverse transverse excitation over a sliding time window should increase the smallest eigenvalue of the information matrix, reduce the corresponding lower bound on estimation error, and improve closed-loop task success compared with geometry-agnostic formation control.
To address this problem, we adopt an information-theoretic perspective. We use the Fisher information matrix (FIM) and the associated Cramér–Rao lower bound (CRLB), which make it possible to quantify how a sequence of observations and the motion geometry translate into the theoretically achievable estimation accuracy for a given noise level [35,36]. Unlike purely binary observability assessments, we are primarily interested in the practical, time-varying informativeness of the problem. Therefore, we employ a windowed (sliding-window) FIM/CRLB, which matches natural estimation and control horizons and allows us to detect temporally localized episodes of information degeneracy. On this basis, we construct information maps for representative maneuvers (e.g., straight-line motion, constant turn, sinusoidal maneuver), which compactly reveal regions of good and poor geometry and help interpret the behavior of the estimator and controller. This viewpoint is consistent with a line of work in which trajectory planning and active information gathering are formulated using criteria based on FIM/CRLB [37,38].
Analytically designing informative control strategies can be difficult, especially when the controller must satisfy multiple competing objectives, such as formation keeping, safety, dynamic constraints, and sensing constraints. For this reason, we incorporate reinforcement learning (RL) as a practical mechanism for automatically constructing control policies that simultaneously stabilize the formation and generate excitation favorable for estimation in the Doppler-only regime. In the RL experiment, we use the Soft Actor–Critic (SAC) algorithm [39,40] to learn a follower policy under partial observability: the policy inputs are quantities available online (the relative estimate and uncertainty measures from the filter), with no access to the true position coordinates in either the reward or the success criterion. This approach is consistent with the broader direction of RL methods in robotics and active perception/information-gathering tasks [41,42,43]. Related DRL-based adaptation has also been explored in adjacent underwater acoustic-system problems, such as resource-aware fountain coding in resource-constrained UASNs [44]; however, these works address communication efficiency rather than geometry-limited relative navigation. In our formulation, RL does not replace the information analysis; instead, we treat it as a trajectory generator and as a point of reference for comparison with classical control strategies, which we subsequently interpret using FIM/CRLB and error metrics computed against the ground truth (only for evaluation).
A concise positioning of the present paper relative to the most relevant research lines is given in Table 1.
The remainder of the paper is organized as follows. The next section summarizes the contributions. We then present the relative kinematics model and the Doppler-only measurement model, together with the simulation assumptions. Next, we discuss observability conditions and degeneracies, define the information measures (FIM/CRLB), and explain how the information maps are generated. The later sections describe the estimator, motion strategies, and simulation protocol, followed by results, discussion, and conclusions.

2. Contributions of This Work

This paper addresses the following question: How can the finite-horizon information limitations of Doppler-only leader–follower relative navigation be quantified, and how can motion strategies mitigate them without sacrificing the formation objective? Guided by the research problem and hypothesis stated in the Introduction, the main contributions are as follows:
  • We identify the key geometry-induced limitations of Doppler-only (range-rate) measurements in leader–follower relative navigation, including degeneracy under predominantly radial motion, matched-motion episodes, insufficient directional diversity, and information decay with increasing separation.
  • We formulate sliding-window information measures based on the FIM/CRLB as a quantitative notion of finite-horizon informativeness (or “observability over time”) and use them to detect episodes in which the recent trajectory is not informative enough to meaningfully reduce uncertainty.
  • We construct information maps for representative maneuvers and use them as an offline diagnostic and design-support tool to compare motion scenarios in terms of their achievable estimation accuracy in the Doppler-only regime.
  • We validate these findings in paired Monte Carlo simulations by combining information metrics with task-level performance measures and by comparing motion strategies (baseline, baseline with explicit excitation, and an RL-trained policy) in the trade-off between formation keeping and estimator-informative excitation.

3. Problem Formulation

Figure 1 summarizes the geometry and notation used in this section. Table 2 gathers all the core notation used in the article. The inertial/world frame is denoted by F W (ENU), while F L is a leader-fixed frame attached to the leader and aligned with the leader heading. The follower and leader positions are p F and p L , the relative position is r = p L p F , the inter-vehicle range is ρ = r , and the line-of-sight (LOS) unit vector is u = r / ρ . The follower and leader velocities are v F and v L , respectively, and the relative velocity is v r e l = v L v F . The desired follower offset relative to the leader is defined in the leader-fixed frame as r d e s L and represented in the world frame as r d e s W = R ( ψ L ) r d e s L .
Throughout the paper, N ( μ , Σ ) denotes a Gaussian distribution with mean μ and covariance Σ . In the scalar case, N ( 0 , σ 2 ) denotes zero-mean Gaussian noise with variance σ 2 .

3.1. Geometry, Reference Frames, and Basic Definitions

As illustrated in Figure 1, we consider two underwater vehicles operating in the horizontal plane (2D): a leader (L) and a follower (F). Their position vectors in the inertial reference frame F W (ENU: x–East, y–North) are denoted by p L ( t ) , p F ( t ) R 2 . The relative state to be estimated is the leader position relative to the follower:
r ( t ) = p L ( t ) p F ( t ) R 2 ,
ρ ( t ) = r ( t ) , u ( t ) = r ( t ) ρ ( t ) .
The relative velocity is defined as follows:
v r e l ( t ) = v L ( t ) v F ( t ) .
The formation objective is to maintain a prescribed geometric relationship with respect to the leader. We assume that the desired relative vector is defined in the leader-fixed frame F L as r d e s L (in our experiments: a constant offset along the leader “forward” axis). Here, F L denotes the leader-fixed frame shown in Figure 1; its x L axis is aligned with the leader heading and is used to define the desired relative offset with respect to the leader body. Its representation in the world frame is:
r d e s W ( t ) = R ( ψ L ( t ) ) r d e s L ,
where ψ L denotes the leader heading, and the 2D rotation matrix is given by:
R ( ψ ) = cos ψ sin ψ sin ψ cos ψ .
In this section (and in the kinematic equations), ψ is treated as a mathematical angle: ψ = 0 corresponds to the + x axis (East), and angles increase counterclockwise, consistent with the notation [ cos ψ , sin ψ ] . In the information maps, for a clearer “map-like” presentation, the angular axis is expressed as the navigational bearing β : 0° at the top (North), 90° to the right (East), and angles increase clockwise. The relationship between ψ (mathematical angle) and β (navigational bearing) is given in Section 5.4.

3.2. Relative Kinematics (Discrete-Time Model)

In the kinematic (point-mass) model, we assume:
p ˙ L ( t ) = v L ( t ) , p ˙ F ( t ) = v F ( t ) .
This yields the relative kinematics:
r ˙ ( t ) = v r e l ( t ) + w ( t ) ,
where w ( t ) captures environmental and modeling disturbances (e.g., currents or motion-model mismatch).
After discretization with step Δ t , we obtain a model consistent with the simulation and filtering implementation:
r k + 1 = r k + Δ t v r e l , k + w k ,
where w k N ( 0 , Q k ) is the process noise. In the experiments and in the EKF-bank process model introduced later, we use the standard random-walk choice Q k = q 2 Δ t I 2 , corresponding to covariance proportional to Δ t .
The velocity vector of any vehicle i { L , F } is parameterized by its scalar speed and heading:
v i ( t ) = v i ( t ) cos ψ i ( t ) sin ψ i ( t ) .
In the leader–follower scenario considered here, the leader moves with constant speed and heading within each episode. By contrast, the follower is controlled at discrete time steps and may change its commanded speed and heading subject to actuator constraints. Therefore, although the leader motion is constant at the episode level, the follower motion may vary within the finite Doppler integration window discussed in Section 3.3.

3.3. Doppler Measurement Model (Range-Rate)

We assume that an inter-vehicle acoustic observation is available that enables estimating the Doppler shift of the signal, which, after appropriate scaling, is interpreted as the range-rate along the LOS direction u, i.e., the unit vector connecting the vehicles. We consider the Doppler-only variant, without a direct time-of-flight range measurement.
It is instructive to contrast the underlying “observability manifolds” for range-only and range-rate-only sensing. In a range-only setup, each measurement constrains the relative position to a circle (in 2D) centered at the follower, i.e., r ( t m ) = ρ m , and over time, the intersection of such circles under varying geometry can localize r. In contrast, in range-rate-only (Doppler-only), each measurement constrains the state through the scalar projection:
s m = u v r e l ,
i.e., it depends on the instantaneous LOS direction and the relative velocity rather than on ρ itself. As a result, the informativeness is strongly geometry-dependent: if the relative motion is nearly radial, v r e l u , then small perturbations of r barely change u and the Doppler Jacobian becomes small, whereas a significant transverse component of v r e l produces sensitivity through changes of the LOS direction. Therefore, unlike range-only navigation where position diversity may suffice, Doppler-only requires transverse excitation (sufficient sideways motion relative to the LOS) to build information about r over time.
Let t m = m T s denote the measurement instants, where T s is the Doppler sampling period. The range derivative is:
ρ ˙ ( t ) = u ( t ) v r e l ( t ) .
We adopt an additive measurement model with independent Gaussian noise:
s m = h ( r ( t m ) , v r e l ( t m ) ) + ν m ,
h ( r , v r e l ) = u v r e l , u = r r ,
ν m N ( 0 , σ s 2 ) , ν m i . i . d .
The sign in the measurement model follows the closing-speed convention: s m > 0 indicates closing (decreasing range, ρ ˙ < 0 ), whereas s m < 0 indicates opening (increasing range). For speeds much smaller than the speed of sound in water, the Doppler relationship can be treated as linear; therefore, in the remainder we work directly with the range-rate model.
In practice, the Doppler estimate provided by an acoustic receiver is typically obtained by processing the signal over a finite integration window of duration T int . If the follower changes its speed and/or heading appreciably within that window, the reported Doppler corresponds to an effective time-averaged range-rate rather than the instantaneous value at t m . A first-order approximation of this effect is:
ρ ˙ ¯ m 1 T int t m T int 2 t m + T int 2 ρ ˙ ( t ) d t = ρ t m + T int 2 ρ t m T int 2 T int ,
so that the reported Doppler can be interpreted as follows:
s m eff ρ ˙ ¯ m + ν m eff = 1 T int t m T int 2 t m + T int 2 u ( t ) v r e l ( t ) d t + ν m eff ,
where ν m eff is an effective error term that absorbs residual nonidealities associated with receiver processing, multipath, Doppler broadening, and other channel-dependent effects. In a more general channel-dependent interpretation, the uniform average above may be replaced by a weighted average over the integration interval; however, the present work does not model that weighting explicitly. When the instantaneous model above is subsequently used in a recursive nonlinear filter update (later instantiated in this work as an EKF bank; see Section 6), such intra-window variability appears as a model mismatch and can be interpreted as an additional contribution to the effective measurement uncertainty. In the simulations, we assume that Doppler updates are frequent enough and the motion is sufficiently smooth so that the instantaneous model is a good approximation; residual effects can be handled by conservative tuning or inflation of the measurement covariance, or by extending the motion model to explicitly include acceleration.
The measurement model is one-dimensional: at a single time instant, it provides only information about the radial component of the relative motion and does not contain direct information about the range ρ . Consequently, Doppler-only relative navigation has several fundamental limitations: (i) a single Doppler sample provides at most rank-one information about the 2D relative position and no direct information about the radial distance; (ii) the problem is strongly geometry-dependent and becomes ill-conditioned in (nearly) radial motion, where v 0 and the Jacobian vanish; (iii) informativeness degrades with increasing separation due to the scaling H 1 / ρ (hence information 1 / ρ 2 ); (iv) since the measurement depends on the LOS direction and the relative velocity, distinct relative configurations can yield similar Doppler histories unless the trajectory provides persistent, diverse excitation over time.
The basic geometric dependencies are illustrated in Figure 2.

3.4. Assumptions and Separation Between “Ground Truth” and Online-Available Information

In the subsequent analysis and simulation experiments, we make the following assumptions: (i) the only inter-vehicle observation is the range-rate measurement defined above (no direct range measurement); (ii) the leader velocity v L is treated as an externally available input to the follower (e.g., communicated by the leader or known from a pre-shared motion plan), and in the baseline experiments, this input is assumed error-free; (iii) the follower has access to its own speed and heading in noisy form, which results in a noisy reconstruction v ˜ F and hence v ˜ r e l = v L v ˜ F used by the estimator and the controller; (iv) motion-model uncertainty and environmental disturbances are captured by the process term w k in the discrete relative-motion model.
Assumption (ii) is a deliberate idealization introduced to isolate the geometry-induced limits of Doppler-only relative-position estimation from additional uncertainty associated with estimating the leader motion online. In other words, the follower does not infer v L from the Doppler measurement itself; rather, Doppler is used to estimate the relative geometry r, while v L is treated as an external input available to the navigation loop. In realistic deployments, the leader-reported velocity may be affected by sensor noise/bias, communication latency, intermittent updates, or timestamp misalignment. In such cases, the follower uses a noisy and delayed v ˜ L , which perturbs the Doppler prediction and degrades the achievable information level quantitatively. Importantly, however, the structural dependence of the Doppler Jacobian on the transverse component v and the 1 / ρ and 1 / ρ 2 information scaling remain intrinsic properties of Doppler-only sensing.
Assumption (iv) concerns process/model uncertainty in the relative-motion dynamics. It does not replace the direct Doppler measurement noise term ν m in the measurement model. In practice, however, process/model mismatch and imperfect reconstruction of v ˜ r e l may also appear in the innovation statistics, which motivates conservative tuning or inflation of the effective measurement covariance in the filter implementation.
A key point is to distinguish “true” quantities (available only in the simulator) from what is available online to the estimator and controller. For evaluation purposes, the simulator provides the ground truth: p L , p F , r, the exact v L , v F , v r e l , and the noise-free signal s t r u e . Based on these, we report ground-truth metrics (e.g., the formation error r r d e s W ) and compute information measures (FIM/CRLB) implied by the measurement model.
The signals available online include: (i) the Doppler measurement s m ; (ii) the known/communicated leader velocity v L ; (iii) follower dead reckoning (noisy speed and heading, hence v ˜ F ); (iv) the output of the relative estimator (the estimate r ^ and an uncertainty measure, e.g., the covariance). Importantly, in our experiments, the controller (both the baseline strategy and the RL-trained policy) does not use the ground truth for decision making; quantities computed on ground truth are used solely for evaluation and for interpreting Doppler-only information limitations.
A missing Doppler estimate (e.g., due to link dropouts or low estimation quality) is treated as a missing measurement at a given instant t m , which we model in the experiments via a validity/gating mechanism at the estimator level.

3.5. Jacobian and Geometry “Sensitivity”

A key property of the Doppler-only measurement is that it depends on the state r only through the line-of-sight direction u = r / ρ (and not through the range ρ itself). Consequently, information about position becomes available only when a change in r produces a noticeable change in the direction u, i.e., when the geometry contains a component of the relative motion perpendicular to u position information.
For the measurement function:
h ( r , v r e l ) = u v r e l = r v r e l ρ ,
The Jacobian with respect to the state r (a 1 × 2 row vector) is:
H ( r , v r e l ) = h r = 1 ρ v r e l ( u v r e l ) u .
The term in parentheses is the projection of v r e l onto the subspace orthogonal to the LOS direction u. Introducing the projection:
v ( I 2 u u ) v r e l , v v .
We can write the Jacobian in the following compact form:
H ( u , v r e l ) = 1 ρ v .
At measurement instant t m , we denote by:
H m H r ( t m ) , v r e l ( t m )
The Doppler measurement Jacobian is evaluated at that instant.
The equation above leads to a simple measure of geometric sensitivity:
H 2 = v ρ = v ρ .
Here · 2 denotes the induced matrix 2-norm (spectral norm). Since H R 1 × 2 is a row vector, this coincides in the present case with the Frobenius norm, i.e., H 2 = H F = H H = v / ρ = v / ρ . If ϕ denotes the angle between r and v r e l , then v = v r e l sin ϕ , which immediately highlights two basic mechanisms of Doppler-only degeneracy: (i) lack of transverse excitation ( ϕ 0 or π , hence v 0 ), and (ii) loss of informativeness as range increases ( ρ ).
Excitation effectiveness and the need for directional diversity can be summarized as follows. For Doppler-only relative navigation, the usefulness of an update is governed by the transverse component of the relative motion:
v = ( I 2 u u ) v r e l ,
whose magnitude is v = v , and the Jacobian norm scales as H 2 = v / ρ . Since a single Doppler sample contributes a rank-one information increment:
J m = H m σ s 2 H m 1 ρ m 2 v , m v , m ,
It reduces uncertainty only along the direction associated with v , m and cannot directly reduce the radial component. Consequently, accurate estimation of r R 2 over a finite horizon requires both persistent transverse excitation (with v not too small for many samples) and directional diversity over time, so that the set { v , m } spans more than one direction within the window (see Section 4.2).
For practical maneuvers, we define a quantitative transverse-excitation threshold by requiring:
v ( t m ) v , min .
In our simulation setup, we use v , min = 0.25 m s−1 as a conservative gating threshold for Doppler updates. This value is consistent with the intuition from the Jacobian scaling above: when v falls below this level, the Doppler Jacobian norm (and thus the incremental information) becomes small, and the estimation performance rapidly degrades.
Note that H r = 0 , since v u . This means that a single Doppler measurement does not provide, in the sense of local linear sensitivity, information in the radial direction; its impact on position estimation appears primarily in the transverse direction through changes in the LOS direction. From an information-bound perspective, for measurement-noise variance R = σ s 2 , the information contribution of a single measurement scales as follows:
J = H R 1 H v 2 ρ 2 σ s 2 ,
which directly motivates the subsequent FIM/CRLB-based analysis and the design of maneuvers that ensure sufficient transverse excitation over time.
Importantly, in this work, persistent excitation is understood as a finite-window condition on the recent motion geometry; it does not imply permanent oscillatory motion of the UUV. In practice, informative excitation may be provided by occasional turns, arc-like approach maneuvers, or temporary lateral corrections when estimator uncertainty is high. The practical requirement is therefore not “always oscillate”, but rather “avoid long weak-geometry intervals when uncertainty is large”.

4. Observability: Conditions and Degeneracies

In this section, we discuss when the relative state r is (practically) identifiable from Doppler-only observations, and which geometries lead to degeneracies. It is crucial to distinguish between two levels: (i) instantaneous observability (a single measurement provides only scalar information); (ii) observability over time, which may emerge only through changes in geometry and maneuvers (persistent excitation). In this work, we are primarily concerned with practical identifiability and estimation performance under stochastic measurement noise and finite time horizons, rather than only a binary (rank-based) notion of observability. Therefore, we complement the geometric discussion with an information-theoretic analysis using the Fisher information matrix (FIM) and the associated Cramér–Rao lower bound (CRLB), including a sliding-window formulation that captures time-varying informativeness. From a deterministic viewpoint, we treat v r e l ( t ) as a known input (reconstructed online from the leader velocity and the follower dead reckoning), whereas in the practical viewpoint, we evaluate the “strength” of observability using information measures aggregated over a finite time horizon. In this paper, practical identifiability does not mean binary local observability in the classical rank-test sense. Rather, it denotes the ability of the recent Doppler measurement history, over a finite time window and under the assumed noise level, to constrain the 2-D relative position strongly enough that the estimator can reduce uncertainty to a task-relevant level. We quantify this using the windowed Fisher information matrix J w ( t ) and the associated bound C w ( t ) : when λ min ( J w ) is close to zero or tr ( C w ) is very large, the recent trajectory is not informative enough to determine both components of r in a practically useful sense.

4.1. Instantaneous Observability: 1D Measurement and Rank-1 Information

For a single Doppler sample at time t m we have (cf. (11)):
s m = u v r e l + ν m , ν m N ( 0 , σ s 2 ) .
Assuming momentarily noise-free measurements and known v r e l ( t m ) 0 , the measurement imposes:
u ( t m ) v r e l ( t m ) = s m ,
i.e., it fixes the value of the projection of v r e l onto the line-of-sight (LOS) direction u (equivalently: the value of cos ϕ m ), while it does not determine the range ρ ( t m ) = r ( t m ) . Even for a noise-free measurement, the Doppler equation imposes only a single constraint on the direction u = r / r , but does not determine the range ρ = r . Consequently, there exist infinitely many vectors r consistent with a single measurement, and the uncertainty along the line of sight (the radial component, associated with ρ ) cannot be reduced by a single observation.
The measurement sensitivity to changes in the state is described by the Jacobian H = h / r from (17):
H ( r , v r e l ) = 1 ρ v , v I 2 u u v r e l .
Since H has size 1 × 2 , a single measurement can “excite” at most one direction in the state space. This is also evident in the contribution to the Fisher information matrix:
J m H m σ s 2 H m = 1 σ s 2 ρ m 2 v , m v , m ,
where ρ m = r ( t m ) and v , m = v ( t m ) . By definition, J m is an outer product of a vector, and, therefore:
rank ( J m ) = 1 , if v , m > 0 , 0 , if v , m = 0 .
This means that a single Doppler measurement provides information in one direction (associated with v , m ), while the second direction remains locally unidentifiable. In particular, under a purely radial geometry ( v r e l u , i.e., v = 0 ) the Jacobian vanishes, and the measurement carries no information about r .

4.2. Observability over Time: Geometric Variability and Persistent Excitation

Since a single observation is rank-1, full identification of the state r R 2 requires accumulation of information over time. For a sequence of measurements in the time window W ( t ) [ t T w , t ] (summing over indices m such that t m W ( t ) ), the sum of contributions (24) takes the form:
J w ( t ) = m : t m W ( t ) J m = m : t m W ( t ) 1 σ s 2 ρ m 2 v , m v , m .
In 2D, matrix (26) is full rank if and only if the excitation directions { v , m } within the window are not collinear. Equivalently, we require that the window contains at least two samples m 1 , m 2 such that:
v , m 1 v , m 2 ,
which implies rank ( J w ) = 2 (and, numerically, a strictly positive smallest eigenvalue).
The geometric interpretation is simple, yet important for leader–follower control. First, it is not sufficient to have v > 0 over a long time if v has an (approximately) constant direction: in that case, information accumulates mainly along the “same” direction, while the other direction remains poorly determined. Second, a change in geometry over time is required, i.e., such an evolution of the relationship between the LOS direction u and the relative motion such that the vectors v , m within the window span more than one direction in R 2 . Full identifiability over a finite time horizon requires persistent geometric excitation, i.e., such an evolution of the relationship between the LOS direction u and the relative velocity v r e l that the vectors v , m in the window W ( t ) are not collinear.
From a formal standpoint, nonlinear local observability tests (e.g., the Hermann–Krener criterion) analyze the rank of a matrix built from the gradients of the measurement function and its successive derivatives along the dynamics. For Doppler-only, the fact that the first gradient H is proportional to v and scales as 1 / ρ implies that (local) observability over a finite time interval depends on whether the trajectory generates nonzero and time-varying transverse components of the relative motion. In this work, instead of using the full Lie-theoretic machinery, we adopt a practical assessment based on aggregating the contributions in (26) over a finite time window, which directly connects to accuracy bounds (CRLB) in Section 5.

4.3. Typical Doppler-Only Degeneracies and Their Symptoms

The above relationships imply four characteristic “poor-geometry” modes that are relevant in leader–follower operation:
  • Purely radial motion ( v 0 ): the Jacobian vanishes, and individual samples carry no information about r .
  • Matched motion in formation ( v r e l 0 ): the Doppler signal is close to zero, and the information is weak regardless of direction.
  • Large range ( ρ large): even with nonzero v , the information decays as 1 / ρ 2 (cf. (24) and (20)).
  • Lack of variability in the excitation direction: v , m is nearly collinear throughout the window, so J w remains ill-conditioned (in practice: it has a very small minimum eigenvalue), and the uncertainty in the “worst” direction cannot be effectively reduced.
In the following sections, we show how these degeneracies can be detected quantitatively using sliding-window measures (FIM/CRLB) and how they appear in information maps for typical maneuvers.

4.4. Observability Maps vs. Information Maps

For nonlinear systems (and in the presence of noise), the binary label “observable/unobservable” is often of limited practical use. In engineering applications, the more relevant question is: is the information within a given time horizon sufficient to meaningfully reduce uncertainty? In this work, the answer is task-oriented rather than based on a universal threshold. We regard the information within a finite time horizon as sufficient when the recent geometry yields persistent, non-collinear transverse excitation, so that J w ( t ) is well-conditioned and the corresponding lower bound C w ( t ) is at a scale compatible with stable formation keeping. Conversely, if λ min ( J w ) remains close to zero or tr ( C w ) becomes large, then the recent measurement history is insufficient to meaningfully contract uncertainty, even if the system may still be locally observable in a formal rank-based sense. In the simulation study, this task relevance is later reflected by the uncertainty-related success criterion used in the leader–follower problem. For this reason, instead of formal observability maps, we introduce information maps based on scalar quantities that: (i) account for the noise level σ s , (ii) depend on the range ρ and the motion geometry through v , and (iii) admit a quantitative interpretation in the context of estimation accuracy bounds.
In the simplest variant, one can map the “instantaneous” geometry sensitivity H 2 = v / ρ (cf. (18)), which directly measures how strongly a single Doppler sample reacts to perturbations of r . In the practical variant (used later), for a given maneuver and horizon T w , we simulate trajectories from different initial configurations, and then map the sliding-window measures resulting from (26), such as the minimum eigenvalue of J w or the corresponding CRLB. Such maps make it possible to:
  • Identify initial configurations with unfavorable information properties in the Doppler-only regime;
  • Compare maneuvers in terms of their “information capability” over a finite horizon;
  • Explain why certain control strategies (e.g., maintaining a stable formation without excitation) lead to a loss of informativeness.

5. Fisher Information and CRLB Bounds

In this work, we use tools from estimation theory to quantitatively describe how much information about the relative position r is carried by a sequence of Doppler measurements and how the motion geometry affects the achievable estimation accuracy. Our starting point is the Fisher information matrix (FIM) and the resulting Cramér–Rao lower bound (CRLB), which provide a lower bound on the estimation error covariance for given models and noise assumptions [35,36]. To avoid a symbol conflict with the identity matrix, in the remainder we denote the FIM by J and the identity matrix by I 2 .

5.1. FIM for Doppler-Only (Range-Rate)

We consider a discrete Doppler measurement at time t m :
s m = h r ( t m ) , v r e l ( t m ) + ν m , ν m N ( 0 , σ s 2 ) ,
h ( r , v r e l ) = u v r e l , u = r / r ,
where u = r / r is the unit line-of-sight (LOS) direction and r = ρ denotes the range (distance) between the vehicles.
Let H m R 1 × 2 denote the Jacobian of the measurement function with respect to r at time t m :
H m h ( r , v r e l ) r t = t m .
The explicit expression for H m is given in (15)–(17).
For a scalar measurement with variance R = σ s 2 , the Fisher information increment is:
J m = H m R 1 H m = 1 σ s 2 H m H m R 2 × 2 .
Since H m has dimension 1 × 2 , a single contribution J m is at most rank 1, which reflects the fact that a single Doppler observation does not identify the full vector r .
Using (17), we obtain a convenient form:
J m = 1 σ s 2 ρ m 2 v , m v , m ,
which directly shows the geometric scaling: The information increases with v , m and decreases according to 1 / ρ m 2 .

5.2. Information Accumulation over Time and the Sliding-Window Variant

Assuming independent measurement noises ν m (i.i.d.) and an additive measurement model, information from successive samples adds up. We define the cumulative version as follows:
J cum ( t ) = m : t m t J m .
However, for analyzing geometric degeneracies, time-local information is more relevant, because observability conditions may deteriorate only intermittently (e.g., during phases dominated by radial motion). Therefore, we use a sliding-window variant with horizon T w :
J w ( t ) = m : t T w < t m t J m .
The horizon T w is an analysis parameter rather than an online estimator-tuning parameter. Its selection reflects a trade-off between temporal locality and geometric coverage within the window. If T w is too short, the window may fail to contain sufficiently non-collinear transverse-excitation directions, so even an informative maneuver can appear temporarily ill-conditioned. If T w is too long, J w ( t ) averages informative and weakly informative phases and becomes less sensitive to transient degeneracy. In the limit of slowly varying geometry, (34) implies that J w ( t ) grows approximately with the number of included samples, K T w / T s , so the absolute values of λ min ( J w ) and tr ( C w ) depend on T w .
In this study, we use T w = 30   s as a task-matched compromise. With T s = 1   s , the window contains K = 30 Doppler samples and, with Δ t act = 2   s , about 15 control updates. For the representative maneuvers considered later, this corresponds to approximately 240° of accumulated direction change in the turn case ( ω = 8   ° / s 1 ) and 1.5 periods in the sine case ( f = 0.05 Hz ), which is sufficient to capture directional diversity while remaining short relative to the episode horizon. Therefore, the absolute numerical values in the maps depend on T w , whereas the underlying geometric mechanisms—radial degeneracy, 1 / ρ 2 information decay, and the need for directional diversity—do not.
In practice, T w corresponds to a realistic estimation and control horizon, and J w ( t ) enables detection of episodes of weak informativeness regardless of the earlier trajectory history. The role of the sliding window is as follows. Because Doppler geometry is time-varying, measurement informativeness can deteriorate only intermittently (e.g., during near-radial phases where v 0 ). Therefore, besides the cumulative FIM, we use a sliding-window variant J w ( t ) over a finite horizon T w as a time-local indicator of information richness. This formulation emphasizes whether the recent trajectory provides sufficiently strong and non-collinear excitation directions and enables degeneracy detection via λ min ( J w ) and the corresponding CRLB.
In practical implementations, not all Doppler samples are used: There may be communication dropouts, rejections by gating, or situations in which the measurement is insufficiently informative (e.g., v 0 ). In that case, the summation in (34) includes only the samples accepted by the weighting/gating mechanism.

5.3. CRLB and Scalar Information Measures

For estimating r R 2 , the Cramér–Rao bound over a time window has the following form:
C w ( t ) = J w ( t ) 1 .
In the Doppler-only regime, J w ( t ) may be ill-conditioned or singular; therefore, we use regularization in numerical computations:
C w ( t ) = J w ( t ) + ϵ I 2 1 , ϵ > 0 .
For unbiased estimators satisfying the standard regularity conditions, the multivariate Cramér–Rao inequality yields:
Cov r ^ r J w ( t ) 1 ,
see, e.g., [35,36].
In singular or nearly singular Doppler-only cases, the inverse J w ( t ) 1 may be undefined or numerically unstable. Therefore, in numerical computations, we use the regularized quantity in (36):
C w ( t ) = J w ( t ) + ϵ I 2 1 ,
as an optimistic numerical surrogate of the CRLB. When J w ( t ) is positive definite, the matrix inequality:
J w ( t ) + ϵ I 2 J w ( t )
implies:
J w ( t ) + ϵ I 2 1 J w ( t ) 1 ,
so the regularized matrix is no larger than the classical CRLB. To compare the informativeness of trajectories and configurations using scalar quantities, we use the following indicators:
  • λ min ( J w ( t ) ) —information in the worst direction (a practical degeneracy measure; when λ min 0 , the problem is ill-conditioned).
  • tr C w ( t ) —the sum of lower bounds on variances; for an unbiased estimator:
    E r ^ r 2 tr C w ( t ) ,
    which interprets tr ( C w ) as a lower bound on the mean-squared error (MSE) in R 2 .
  • det ( J w ( t ) ) (equivalently det ( C w ( t ) ) )—a D-optimality-type measure, useful for assessing the “volume” of the error ellipse.
  • Optionally, the condition number κ ( J w ) = λ max ( J w ) / λ min ( J w ) is used as an indicator of susceptibility to numerical and geometric degeneracies.

5.4. Information Maps as a Function of the Initial Geometry

To illustrate how the leader–follower geometry and the choice of maneuver affect the informativeness of a Doppler-only measurement, we construct maps based on the sliding-window Fisher information matrix and the corresponding CRLB. Unlike a binary “observable/unobservable” classification, these maps quantify the degree to which the problem is well conditioned over a finite horizon T w and directly indicate configurations in which estimation is inherently difficult (either due to weak information in the worst direction or due to a large lower bound on the estimation error).
The maps are parameterized by the initial relative configuration r 0 , specified by the initial range ρ 0 and the navigation bearing β 0 , defined as follows: β 0 = 0 ° corresponds to the North direction (the + y axis), β 0 = 90 ° to the East direction (the + x axis), and angles increase clockwise. In the world frame F W (East–North), the initial condition is written as follows:
r 0 ( ρ 0 , β 0 ) = ρ 0 sin β 0 cos β 0 .
In the implementation, the angular grid is generated using the mathematical angle convention φ (0° on the + x axis and increasing counterclockwise), whereas the plots are presented using the navigation convention β (0° at North and increasing clockwise). The mapping between these angles is unambiguous:
β = ( 90 ° φ ) mod 360 ° .
For clarity and to avoid ambiguities when interpreting the angular axis in the maps, the basic values for the cardinal directions are summarized in Table 3.
For each pair ( ρ 0 , β 0 ) we generate a deterministic relative trajectory over the horizon T w by propagating the relative position according to the discrete model:
r k + 1 = r k + T s v r e l ( t k ) , t k = k T s ,
where T s is the Doppler sampling period. We deliberately do not add process noise in the maps, because our goal is to isolate the geometric limitations that arise from the Doppler-only measurement model itself and from information accumulation within the window.
We consider three relative-velocity templates v r e l ( t ) , consistent with the implementation:
v r e l ( t ) = v 0 , straight R ( ω t ) v 0 , turn R A sin ( 2 π f t ) v 0 , sine
where v 0 = [ v r e l , 0 ] (East direction at t = 0 ), R ( · ) is the 2D rotation matrix, ω is a constant turn rate, and ( A , f ) are the amplitude and frequency of the heading oscillation. This selection enables a comparison of three typical situations: no geometric variability (straight), a sustained change in motion direction (turn), and periodic excitation (sine).
For the trajectory (41), we compute at each step the Jacobian H k from (15) and the corresponding information contribution, and then sum them over a window of length T w :
J w ( ρ 0 , β 0 ) = k = 0 K 1 δ k H k σ s 2 H k , K = T w T s ,
where δ k { 0 , 1 } is a sample-usage indicator (gating). Introducing δ k reflects practical filtering: a sample is considered useful only if (i) the range is not too small, (ii) the relative speed does not vanish, and (iii) there is sufficient transverse excitation v , k = v , k (cf. (16)), i.e., the component of relative velocity perpendicular to the line of sight does not fall below a threshold. The definitions and gating thresholds, consistent with the settings used to generate the maps, are listed in Table 4. In addition, we report a map of the number of used samples N u s e d = k = 0 K 1 δ k , which allows us to disentangle two effects: (a) weak information due to geometry (small λ min despite many samples), and (b) lack of information due to sample rejection by gating (small N u s e d ).
Based on J w , we compute the regularized CRLB:
C w ( ρ 0 , β 0 ) = J w ( ρ 0 , β 0 ) + ϵ I 2 1 ,
and for visualization, we select two scalar measures: log 10 λ min ( J w ) and log 10 tr ( C w ) . A logarithmic scale is used due to the large dynamic range (in particular near degenerate regions). Since we compare three maneuvers, each set of maps (for a given metric) is plotted using a shared color scale, enabling direct comparison. The parameter values used to generate the maps in Figure 3, Figure 4 and Figure 5 are listed in Table 5.
In the context of the information maps, the expressions “good geometry” and “poor geometry” are used operationally rather than as universal binary labels. A configuration is regarded as poor or weakly informative if, within the window T w , one or more of the following occurs: (i) many samples are rejected by gating, i.e., δ k = 0 because ρ k ρ min , v r e l , k v min , or v , k v , min ; (ii) the retained samples produce a nearly singular windowed information matrix, reflected by a very small λ min ( J w ) ; and/or (iii) the corresponding lower-bound measure tr ( C w ) is large. Conversely, good geometry denotes configurations in which many samples are retained and the recent trajectory provides sufficiently strong and directionally diverse transverse excitation, yielding comparatively large λ min ( J w ) and small tr ( C w ) . Thus, Figure 3 identifies weak geometry through the lack of usable samples, while Figure 4 and Figure 5 quantify the same notion through the conditioning of J w and the corresponding error bound.
Figure 3, Figure 4 and Figure 5 reveal three qualitatively different cases. For the straight motion, a wide sector around β 0 90 ° and 270 ° (East/West directions) appears, in which many samples are rejected by gating (Figure 3). This is a direct consequence of geometry: for these bearings, the vector r is approximately parallel to v r e l (here v 0 points East), leading to an almost purely radial relative motion and v 0 . In the same sector, λ min ( J w ) drops to extremely small values (Figure 4), and tr ( C w ) increases by many orders of magnitude (Figure 5), which corresponds to practical information degeneracy within the time window.
In the turn maneuver, the changing direction of v r e l ( t ) produces persistent excitation over the window, and the information contributions do not accumulate along a single fixed direction. As a result, the map of λ min ( J w ) becomes much more homogeneous with respect to bearing, and the map of tr ( C w ) indicates consistently lower error bounds across a wide range of angles. A pronounced dependence on range remains: as ρ 0 increases, information decreases and CRLB increases, consistent with the scaling H 1 / ρ and information 1 / ρ 2 .
The sine maneuver provides periodic excitation, and for most configurations, yields maps similar to turn; however, regions with fewer used samples are visible (Figure 3), which translates into local degradation of λ min ( J w ) and an increase in tr ( C w ) . This corresponds to situations in which, over certain portions of the window, the direction v r e l ( t ) becomes near-radial again, so that v periodically drops, and some observations either carry little information (or are rejected by gating).
In summary, the maps in Figure 3, Figure 4 and Figure 5 provide a compact diagnostic tool: they highlight configurations and maneuver types for which Doppler-only offers good information conditioning, as well as those for which the problem becomes ill-conditioned even before accounting for the specifics of a particular estimator or controller. This enables us to interpret the behavior of filters and motion strategies in the subsequent sections directly through the lens of geometry and Doppler-only information limitations.
Importantly, in the present work the information maps are used offline and are not an online input to the estimator or the controller. Their role is threefold: (i) to characterize a priori which initial geometries and maneuver classes are informative or degenerate in the Doppler-only regime, (ii) to interpret the behavior observed in Monte Carlo simulations, and (iii) to motivate the design of excitation-aware motion strategies. Thus, the information maps are a diagnostic and design-support tool rather than a real-time module of the navigation stack.

6. Estimator and Simulation Implementation

To improve readability, Section 6, Section 7, Section 8 and Section 9 are organized as follows. Section 6 summarizes the common simulation–estimation pipeline used by all methods and the associated information flow. Section 7 defines the compared motion strategies. Section 8 specifies the paired Monte Carlo protocol, success criterion, and reported metrics. Section 9 then answers the two research questions posed in the Introduction: first, how geometry affects finite-horizon informativeness, and second, whether excitation-aware motion strategies improve task performance.
All experiments presented in this work (Monte Carlo simulations, baseline control, and the policy trained with RL) use the same simulation model and the same estimator architecture. This ensures that the observed differences in results stem from the choice of motion strategy, rather than from different models or filtering implementations. In the simulator, the ground-truth states p L ( t ) and p F ( t ) , as well as the resulting relative state r ( t ) = p L ( t ) p F ( t ) , are generated according to the kinematic model (8). Based on this, we generate Doppler measurements s m using the model (11)–(13). At the same time, the estimator and the controller have access only to signals available online (see Section 3.4), i.e., noisy Doppler s m , the known leader velocity v L , and noisy observations of the follower speed and heading, from which we reconstruct v ˜ F and v ˜ r e l = v L v ˜ F . The “truth” is used exclusively to report metrics and to compute informational quantities (FIM/CRLB) in evaluation mode (it is not used in the estimation process).
To improve readability, Figure 6 summarizes the high-level simulation–estimation–control loop used in all experiments. The diagram intentionally abstracts away from software-level details (e.g., substepping, rendering, and file export) and highlights only the main methodological blocks and their information flow. In particular, it distinguishes clearly between online quantities available to the estimator and controller, and truth-based quantities used only in the evaluation branch.
Time in the implementation is separated into three scales: (i) the integration step Δ t (motion simulation and filter prediction); (ii) the control step Δ t act (action/policy update), and (iii) the Doppler sampling period T s . Within one control step, we perform n = Δ t act / Δ t integration substeps. Prediction of the EKF bank is performed at every substep, whereas the measurement update is called only at times t m = m T s , when a new Doppler sample is available.

6.1. EKF Bank (Multi-Hypothesis) and Mixture

The estimated state is the relative position of the leader with respect to the follower, r R 2 . In the Doppler-only regime, a single observation is scalar and depends on the state r only through the LOS direction u = r / r (cf. (11)), rather than through the range ρ itself. In practice, this means that in many geometric configurations the measurement does not uniquely resolve where the leader is relative to the follower in the plane; instead, it imposes a constraint on the projection of the relative motion onto the LOS. For a known input v r e l and a measurement s, the equation:
u v r e l = s
only fixes the value of cos ϕ (the angle between u and v r e l ). In 2D (for | s | < v r e l ) this typically corresponds to two solutions on the unit circle: ϕ and ϕ , i.e., two symmetric LOS directions. Intuitively, for some trajectories (e.g., motion with little geometric variability), the system may “see” the same Doppler both when the follower is on the left and on the right side of the leader (symmetry about the direction of motion). Moreover, a single measurement contains no information about ρ , so the admissible set of states forms an elongated “ridge” (in the likelihood sense), and the posterior is generally non-Gaussian and may be multimodal.
This effect is particularly pronounced during episodes of poor geometry, when transverse excitation vanishes ( v 0 , see Section 4.3). Then the Jacobian (17) has a very small norm, and Doppler updates have negligible impact on the state: the filter essentially receives no signal that would resolve “which side” the solution lies on. In such a situation, a single EKF (a local Gaussian approximation around one hypothesis) may: (i) commit too early to an incorrect mode (e.g., the wrong sign of the transverse component), (ii) stabilize seemingly consistent innovations despite a significant position error, and (iii) fail to recover the correct solution even if the geometry later becomes more informative (because jumping between modes would require a large state change that an EKF will not perform continuously).
For this reason, we use a bank of N EKF filters as a simple multi-hypothesis mechanism (approximating the posterior by a Gaussian mixture). Each hypothesis represents a different possible initial configuration (in particular a different bearing, and thus a different “side” of the solution), and their weights are updated based on the likelihood of subsequent measurements. Only once the trajectory generates sufficiently diverse excitation (persistent excitation) does the geometry begin to discriminate hypotheses unambiguously and the weight distribution can concentrate correctly.
A single filter in the bank uses the discrete process model corresponding to (8), with the input given by the reconstructed relative velocity v ˜ r e l :
r k + 1 = r k + Δ t v ˜ r e l , k + w k ,
w k N 0 , Q k , Q k = q 2 Δ t I 2 ,
where q is the process-noise intensity parameter.
In what follows, we consistently distinguish: (i) the state estimate r ^ and (ii) the unit LOS vector u = r / r .
The measurement model in the filter has the Doppler-only form (11)–(13), with the difference that we also use the input v ˜ r e l in the measurement function:
s m = h r ( t m ) , v ˜ r e l ( t m ) + ν m ,
h ( r , v ˜ r e l ) = u v ˜ r e l , u r ρ , ρ = r ,
ν m N 0 , σ s , filt 2 ,
where σ s , filt is the measurement-noise standard deviation assumed by the filter. In the implementation we choose σ s , filt σ s to account for additional uncertainty due to reconstruction of v ˜ r e l (follower heading and speed errors).
Together with the process-noise intensity q, the parameter σ s , filt is one of the main robustness knobs of the EKF bank, because it absorbs not only direct Doppler noise but also effective mismatch caused by imperfect reconstruction of v ˜ r e l and by intra-window motion variability.
For hypothesis i, the innovation and its variance are defined in the standard way:
y m ( i ) = s m h r ^ m | m 1 ( i ) , v ˜ r e l , m ,
S m ( i ) = H m ( i ) P m | m 1 ( i ) H m ( i ) + σ s , filt 2 ,
where H m ( i ) is the Jacobian (15) evaluated at r ^ m | m 1 ( i ) . The state and covariance update is:
K m ( i ) = P m | m 1 ( i ) H m ( i ) S m ( i ) 1 ,
r ^ m | m ( i ) = r ^ m | m 1 ( i ) + K m ( i ) y m ( i ) ,
P m | m ( i ) = I 2 K m ( i ) H m ( i ) P m | m 1 ( i ) I 2 K m ( i ) H m ( i ) + K m ( i ) σ s , filt 2 K m ( i ) ,
which ensures that P m | m ( i ) remains positive semidefinite.
We perform the measurement update only for samples that satisfy the gating conditions (Table 4):
ρ ^ m | m 1 > ρ min , v ˜ r e l , m > v min , v , m > v , min ,
where ρ ^ m | m 1 and u ^ m | m 1 are computed from the EKF-bank mixture prediction (online):
r ^ m | m 1 = i = 1 N w m 1 ( i ) r ^ m | m 1 ( i ) ,
ρ ^ m | m 1 = r ^ m | m 1 , u ^ m | m 1 = r ^ m | m 1 ρ ^ m | m 1 ,
v , m = I 2 u ^ m | m 1 u ^ m | m 1 v ˜ r e l , m , v , m = v , m .
The condition v , m > v , min eliminates (nearly) radial cases for which the Jacobian (17) vanishes and the update has negligible informational effect.
Due to the prior ambiguity in the Doppler-only regime, we use a bank of N filters { EKF ( i ) } i = 1 N to represent alternative initial hypotheses (see Figure 7). In the hard variant, we initialize the hypotheses on a ring of radius ρ guess with a uniform angular distribution:
r ^ 0 ( i ) = ρ guess cos φ 0 + 2 π ( i 1 ) / N sin φ 0 + 2 π ( i 1 ) / N , i = 1 , , N .
We update hypothesis weights w m ( i ) (with i w m ( i ) = 1 ) based on measurement likelihood, derived from (51) and (52). For a scalar measurement, the log-likelihood is:
m ( i ) = 1 2 ln 2 π S m ( i ) + y m ( i ) 2 S m ( i ) ,
and we use a tempered weight update:
w ˜ m ( i ) w m 1 ( i ) exp 1 τ m ( i ) ,
w m ( i ) = w ˜ m ( i ) j w ˜ m ( j ) ,
where τ > 0 is the tempering parameter (for τ > 1 the weight distribution is less concentrated, reducing the risk of “prematurely” zeroing alternative hypotheses based on a single sample). In the present study, τ = 1.5 was selected empirically as a conservative compromise rather than derived from a formal uncertainty bound. Smaller values led to more aggressive weight concentration and premature loss of alternative hypotheses under weak Doppler geometry, whereas larger values delayed correct concentration when the geometry later became informative. The chosen value was retained because it reduced premature mode collapse while still allowing effective discrimination of hypotheses once sufficiently informative motion appeared. Additionally, we use two regularization mechanisms: a weight floor w min and relaxation toward the uniform distribution:
w m ( i ) max w m ( i ) , w min , w m ( i ) w m ( i ) j w m ( j ) ,
w m ( i ) ( 1 α ) w m ( i ) + α 1 N , 0 α 1 ,
where α is a “forgetting” parameter. These mechanisms improve numerical conditioning and limit the permanent loss of hypotheses due to a single random noise realization.
The bank output is the aggregated estimate and its covariance (moment matching):
r ^ m = i = 1 N w m ( i ) r ^ m ( i ) ,
P m = i = 1 N w m ( i ) P m ( i ) + r ^ m ( i ) r ^ m r ^ m ( i ) r ^ m .
Despite tempering (63) and regularization (64) and (65), over long episodes the weight distribution may become strongly concentrated (most mass accumulates on a single hypothesis), especially when the geometry is periodically weakly discriminative. If the dominant hypothesis is inconsistent with the true state, the bank loses diversity and may fail to recover the correct solution even after more informative motion appears later. Therefore, we use a rejuvenation mechanism.
We monitor concentration via the maximum weight and the effective sample size:
w max = max i w m ( i ) ,
ESS = 1 i = 1 N w m ( i ) 2 .
We trigger rejuvenation when ESS < η ESS N or w max > w trig , then perform systematic resampling of hypothesis indices according to the weights, and for each copy we apply jitter and covariance inflation:
r ^ ( i ) r ^ ( i ) + ξ , ξ N ( 0 , σ jit 2 I 2 ) ,
P ( i ) P ( i ) + σ inf 2 I 2 ,
and we reset the weights to w ( i ) = 1 / N . The EKF-bank and rejuvenation parameters are listed in Table 6. The rejuvenation hyperparameters are chosen to reflect two distinct effects: (i) a plausible spatial dispersion of alternative hypotheses when the posterior becomes overly concentrated under weak Doppler geometry, and (ii) a conservative increase in uncertainty to avoid overconfidence under model mismatch. Concretely, σ jit sets the spatial jitter applied after resampling and is selected to be of the same order as the expected position change and estimator uncertainty over a few control/measurement cycles, so that the bank can “re-explore” nearby modes without destabilizing the filter. The inflation term σ inf is used as a robustness margin that absorbs unmodeled effects (e.g., intra-window acceleration, imperfect v ˜ r e l reconstruction, and intermittent informative samples due to the gating threshold v v , min ). With Δ t act = 2   s and typical speeds on the order of 1   m   s −1 (Table 7), the displacement per control step is about 2   m ; thus σ jit = 6   m corresponds to a few control steps of local hypothesis exploration, while σ inf = 10   m provides a conservative robustness margin against mismatch. In our implementation these parameters were set conservatively and verified by checking that innovations remain statistically consistent (normalized innovation squared (NIS) within expected bounds) and that rejuvenation restores diversity only when the bank becomes degenerate (low ESS/high w max ). Accordingly, σ jit = 6   m should be interpreted as a simulation-specific estimator-side heuristic tied to the present kinematic scale rather than as a universal physical parameter of UUV maneuverability. It does not act directly on the follower control law; instead, it affects closed-loop behavior only indirectly through estimator recovery after weight collapse. If chosen too small, nearby alternative modes are not sufficiently re-explored and the bank may remain trapped near an incorrect mode; if chosen too large, the hypotheses become overly dispersed and the bank re-concentrates too slowly. For platforms with substantially different speeds, control periods, or maneuver envelopes, σ jit should therefore be rescaled, for example relative to the typical displacement over several control steps and/or the current estimator uncertainty.
For reproducibility, it is useful to distinguish three groups of EKF-bank hyperparameters. The first group consists of the parameters that primarily absorb environmental and model uncertainty, namely the process-noise intensity q in (47) and the assumed measurement-noise level σ s , filt in (50). These are the primary robustness knobs of the filter: if current disturbance, dead-reckoning noise, or intra-window motion variability increase, more conservative values of q and/or σ s , filt are appropriate to preserve innovation consistency. The second group, ( N , τ , w min , α ) , mainly controls mixture diversity and numerical stability: increasing N improves multimodal coverage at higher computational cost, while larger τ or α reduces premature collapse of hypothesis weights. The third group, ( η ESS , w trig , σ jit , σ inf ) , governs rejuvenation. Here, σ jit sets the spatial scale of local re-exploration after resampling, whereas σ inf acts as a robustness margin against model mismatch. In the present study, the parameters were chosen conservatively so that the mixture NIS remained within a reasonable range and rejuvenation was triggered mainly when the bank became degenerate under weak geometry rather than during nominal informative motion. Thus, increased environmental uncertainty would primarily call for more conservative q, σ s , filt , and possibly σ inf , whereas the remaining parameters mainly affect numerical robustness, diversity retention, and computational cost.

6.2. Diagnostic Metrics: NIS and v ¯

For hypothesis i at the measurement time t m , we define the normalized innovation squared (NIS) as follows:
NIS m ( i ) = y m ( i ) 2 S m ( i ) ,
where y m ( i ) and S m ( i ) are given by (51) and (52). Under a correct model and properly tuned covariances, approximately NIS m ( i ) χ 1 2 ; exceedances of the corresponding quantiles (e.g., 95%: 3.84 , 99%: 6.63 ) indicate a potential inconsistency.
In the EKF-bank analysis we use the mixture NIS (a weighted average across hypotheses):
NIS m = i = 1 N w m ( i ) NIS m ( i ) .
NIS is used as a diagnostic signal (and as a reward component in the RL experiment in Section 7.2), while measurement rejection is handled by the gating in (56).
The second diagnostic is a scalar that characterizes the transverse excitation of the LOS geometry, related to the Jacobian norm H 2 = v / ρ (cf. (18)). For each Doppler sample we define:
u ^ m r ^ m r ^ m ,
v , m = v ˜ r e l , m u ^ m v ˜ r e l , m u ^ m ,
and then report the average over control step k:
v ¯ , k = 1 N k m M k v , m , N k = | M k | ,
where M k is the set of Doppler measurement indices that fall within control step k. The metric v ¯ , k does not include the factor 1 / ρ nor σ s ; its role is to indicate whether the trajectory generates the transverse excitation that is a necessary condition for extracting information from Doppler measurements (cf. (24) and Section 4).

7. Motion Strategies: Baseline and Policy as a Trajectory Generator

In this section, we define two follower control strategies compared in the experiments: (i) a deterministic baseline controller, and (ii) a policy learned via reinforcement learning (RL). In both cases, the controller operates in closed loop using only signals available online, i.e., the relative state estimate and its uncertainty produced by the EKF bank (cf. (66) and (67)), as well as the known/communicated leader velocity v L . The controller has no access to the simulator “truth” (the true r ), neither during execution nor in the RL reward function; quantities computed on “truth” are used exclusively for evaluation. We then interpret the RL results through informational metrics (FIM/CRLB) and task-performance metrics: (i) success and time-to-success defined solely from online quantities ( e k , σ max , k ), and (ii) the formation error on “truth” e form , true ( t ) reported only for evaluation (cf. Section 3.4, Section 8 and Section 9).

7.1. Baseline: Formation Keeping + Transverse Excitation

Let t k denote the control update instants (every Δ t act ), and let r ^ k and P k be, respectively, the mixture estimate of the relative position and its covariance at time t k (cf. (66) and (67)). We denote the desired geometric relationship in the world frame by r des W ( t k ) (defined in Section 3). We define the formation error on the estimate as follows:
e k r ^ k r des W ( t k ) .
The baseline controller is defined in the vector space of the follower velocity. In the basic variant (used in the experiments), we employ a proportional term:
v F , k nom = v L , k + K P e k ,
where K P R 2 × 2 is the gain matrix (typically diagonal). This form is consistent with the relative kinematics r ˙ = v L v F and stabilizes the formation error (assuming a correct estimate and moderate actuator constraints).
However, formation keeping alone does not eliminate episodes of weak Doppler-only informativeness (e.g., near-radial geometry or insufficient variability of the excitation direction). Therefore, we augment the baseline with an excitation term that generates a controlled transverse component.
As a scalar “uncertainty” measure we use:
σ max , k λ max ( P k ) .
When σ max , k exceeds a prescribed threshold, we add to the commanded velocity a component tangential to the line of sight (LOS) defined by the relative estimate r ^ k .
We define the unit LOS direction and the tangential direction in 2D as follows:
u ^ k r ^ k r ^ k , t ^ k u ^ y , k u ^ x , k .
Excitation is activated via a saturation function:
g k sat σ max , k σ trig σ hi σ trig , 0 , 1 ,
where σ trig is the activation threshold, and σ hi is the value at which excitation reaches full amplitude.
We choose the sign of the tangential component so as to (in the sense of the estimate) reduce the error along the tangential direction:
s k sign t ^ k e k , v F , k exc g k v exc , max s k t ^ k ,
where v exc , max is the maximum excitation amplitude. The final commanded follower velocity vector is:
v F , k cmd = v F , k nom + v F , k exc .
In the implementation, v F , k cmd is converted to scalar speed and heading commands and is subject to the constraints imposed in the simulator (speed saturation and turn-rate limitation). The geometric interpretation is shown in the Figure 8.

7.2. Learned Policy (RL) as an Excitation Strategy

The second strategy is a control policy π θ learned via reinforcement learning. At each control time t k , the policy receives an observation o k composed exclusively of quantities available online, i.e., the estimate r ^ k , uncertainty P k , EKF-bank diagnostics, and kinematic signals available in the control loop. The policy has no access to the true state r nor to truth-based metrics.
In the experiments, the observation vector is (full, horizontal notation):
o k = e x , k , e y , k , r ^ x , k , r ^ y , k , v L , x , k , v L , y , k , v ˜ F , x , k , v ˜ F , y , k , P x x , k , P y y , k , P x y , k , σ max , k , NIS k , v ¯ ( t k ) , w max , k , ESS k / N ,
where e k is defined in (77), σ max , k in (79), NIS k is the normalized innovation for the mixture estimate (cf. (73)), v ¯ ( t k ) is the transverse-excitation proxy (cf. (76)), and w max , k and ESS k are statistics of the bank weight distribution (cf. (68)). The vector v ˜ F , k comes from follower dead-reckoning (noisy speed and heading measurements, converted to vector form in F W ). This choice ensures that the policy relies only on online information, without using “truth”.
The action a k parameterizes the follower motion command. We use a two-dimensional action:
a k = [ a k ( v ) , a k ( ψ ) ] [ 1 , 1 ] 2 ,
which is mapped to a speed increment and a heading increment (or equivalently a turn-rate command), while respecting the actuation constraints of the motion model.
We design the reward function to enforce a trade-off between formation stabilization and maintaining conditions favorable for estimation. The reward uses the formation error computed on the estimate e k , the uncertainty measure σ max , k , and EKF-bank diagnostic indicators:
R k = w e e k 2 w σ σ max , k w u a k 2 2 w nis NIS k + w min v ¯ ( t k ) , v ref w t ,
where w e , w σ , w u , w nis , w , w t 0 are weights, and v ref is the target level of transverse excitation. All terms in (86) are computed without using “truth”.
The episode termination criterion is also defined exclusively through online quantities. We declare success if, for N hold = 10 consecutive control steps, the following conditions are simultaneously satisfied:
e k 2 < 10   m , σ max , k < 10   m .
Additionally, the episode is truncated after a time limit of T max = 400   s (timeout).
For policy learning, we use the Soft Actor–Critic (SAC) algorithm [39]. During training, we randomize initial conditions (the geometry r 0 ), leader motion parameters, and noise realizations (Doppler and dead-reckoning) to mitigate overfitting to a single scenario. After training, the parameters θ are frozen, and the policy is used in Monte Carlo tests under the same initial-condition distributions as the baseline.
Figure 9 shows example TensorBoard curves collected during SAC training in the Doppler-only environment. These plots allow one to verify whether the policy learns the intended trade-off between formation stabilization (decrease in the estimated error), estimator consistency (uncertainty control), and generation of transverse geometric excitation (the v ¯ measure), which is necessary to extract information from Doppler. In addition, the training logs track the curriculum difficulty level, which helps interpret trend changes over time.

8. Experimental Protocol (Simulations)

In this section, we describe the simulation environment, the procedure for sampling scenarios, the compared control strategies (RL vs. baseline and baseline + exc), and the metrics used for evaluation. The goal of the protocol is to obtain a comparison that is as “fair” as possible: each strategy is tested with the same noise realizations and the same initial conditions (paired random seeds), and the success criterion is based exclusively on quantities available online (the estimate and its uncertainty).

8.1. Environment and Time Discretization

We run the experiments in a 2D simulator (horizontal ENU plane), in which the leader moves with a constant speed and heading within an episode, while the follower is controlled in closed loop by one of the compared strategies (Section 7). The simulation parameters are listed in Table 7.
Time is separated into three scales (cf. Section 6):
  • Integration/prediction step Δ t (motion simulation and EKF-bank prediction);
  • Control step Δ t act (action update in baseline or RL);
  • Doppler sampling period T s (measurement updates).
In each episode, we execute at most N max control steps, yielding the time horizon T max = N max Δ t act .

8.2. Scenario Sampling and Difficulty Level

Each episode is defined by a random initial state and the leader motion parameters. In evaluation mode (hard), we sample:
  • Initial range ρ 0 U ( ρ min , ρ max ) ;
  • Initial bearing β 0 U ( 0 ° , 360 ° ) and set r 0 according to (39);
  • Leader speed v L U ( v min , v max ) and leader heading ψ L U ( 0 ° , 360 ° ) .
The follower starts with a random heading, and its dead-reckoning signals (speed and heading) are corrupted by noise, which propagates to v ˜ F and v ˜ r e l used by the estimator and the controller.
During RL training, we use curriculum learning, i.e., a gradual transition from easier to more difficult conditions (e.g., a narrower distribution of ρ 0 , less stringent success thresholds), whereas in testing, we report results in the hard mode, which corresponds to the target setting (the widest initial distributions and the most stringent thresholds).

8.3. Compared Control Strategies

For convenience, the compared motion strategies and their role in the study are summarized in Table 8. The controller structures themselves were defined earlier in Section 7; here we only specify which strategies enter the paired Monte Carlo comparison.
The comparison, therefore, includes one geometry-agnostic reference controller (baseline), two excitation-aware strategies (baseline + exc and RL), and one non-intelligent stochastic reference (random).

8.4. Paired Monte Carlo Evaluation

We evaluate each strategy in a Monte Carlo test consisting of N MC = 300 episodes. To ensure fair and directly comparable results, all strategies are tested on the same set of scenarios: for each trial, we draw the initial conditions, the leader motion parameters, and the noise realizations (Doppler and dead-reckoning), and then run every strategy on that identical scenario. This paired-seed protocol reduces the impact of randomness on the comparison and allows the observed differences to be attributed primarily to the control strategy.
During evaluation, the RL policy is executed in deterministic mode (without additional exploration noise), using parameters fixed after training.

8.5. Definition of Success and Episode Termination Criteria

The success condition is defined solely in terms of online quantities:
e k 2 < e tol , σ max , k < σ tol for N hold consecutive control steps ,
where e k = r ^ k r des W ( t k ) (see (77)) and σ max , k = λ max ( P k ) (see (79)). In hard mode we use more stringent thresholds ( e tol , σ tol ) than in easy mode. An episode is also terminated by a timeout at T max .

8.6. Metrics and Plots Reported in This Paper

We report results at two levels: (i) episode-level metrics and (ii) time histories.

Episode-Level Metrics

For each episode, we record:
  • success—a success indicator (online condition satisfied);
  • time-to-success t s (only for successful episodes);
  • final true formation error e form , true ( t end ) = r t r u e ( t end ) r des W ( t end ) ;
  • true estimation error r ^ ( t ) r t r u e ( t ) (for diagnostics).
In parallel, to interpret Doppler-only limitations, we compute information-theoretic measures using the “truth” (not used in control):
  • The windowed Fisher information J w ( t ) with window T w = 30   s (see (34));
  • λ min J w ( t ) and tr C w ( t ) , where C w ( t ) = J w ( t ) + ϵ I 2 1 (see (36)).
In the computations we use regularization ϵ and (for consistency with the filtering pipeline) we sum only over Doppler samples accepted by the gating mechanism.
Additionally, we report two compact degeneracy indicators:
p deg # { k : ρ true , k > ρ min λ min ( J w ( t k ) ) < λ thr } # { k : ρ true , k > ρ min } ,
p 0 # { k : | v ¯ , k | < v , thr } # { k } .
The metric p 0 is diagnostic and is not used as the primary comparison metric; it helps interpret episodes of information loss caused by vanishing transverse excitation. In the experiments, we set λ thr = 10 10 and v , thr = 10 3 .

9. Results

In this section we present the results of comparing three follower control strategies: (i) baseline (a P controller stabilizing the formation on the estimate), (ii) baseline + exc (baseline augmented with an excitation component tangent to the LOS; see Section 7.1), and (iii) a policy learned via RL (SAC; see Section 7.2). As a sanity check, we also show a random control, but we do not treat it as a practical method for formation keeping. The results are reported in a question-oriented manner. Section 9.1 and Section 9.2 address Q1 by linking the offline information-map analysis and the sliding-window FIM/CRLB measures to the geometric regimes observed in simulation, and by showing the empirical relationship between transverse excitation and the information metrics. Section 9.3 addresses Q2 through the paired Monte Carlo comparison of the baseline, baseline + excitation, and RL strategies. Section 9.4 then provides a representative case study that explains the mechanism underlying the aggregate comparison. A video file with three representative episodes is also available in the repository provided at the end of this article.
All results are based on 300 Monte Carlo episodes, using the same initial conditions and the same noise realizations for all compared strategies (paired seeds). Success is defined exclusively using quantities available online: the formation error on the estimate and the uncertainty measure ( σ max , k ). Metrics with the suffix true and information measures (FIM/CRLB) are computed on “ground truth” only for evaluation (see Section 3.4). In the information analysis, we report windowed measures over the horizon T w (see Section 5) as well as the fraction of episodes in which the degeneracy λ min ( J w ) 10 10 occurs.

9.1. Q1: Geometry, Information Maps, and Finite-Horizon Informativeness

The maps in Figure 3, Figure 4 and Figure 5 serve as a reference point for interpreting the behavior of the controller and the estimator. In particular, they predict that (i) radially dominated motion ( v 0 ) and (ii) large ranges ρ lead to very poor Doppler-only informativeness, and episodes with insufficient variability of the excitation direction result in an ill-conditioned matrix J w . In practical leader–follower episodes, this effect manifests itself as follows: a controller aiming at an “ideal” formation may reduce v r e l and/or v , which over time leads to an increase in the CRLB and a loss of estimation stability. In contrast, the RL policy, trained with an uncertainty penalty, is pressured to generate maneuvers that maintain informativeness over a finite time horizon.

9.2. Q1 (Continued): Correlation of Transverse Excitation with FIM/CRLB

The theoretical relation (20) suggests that information increases with the transverse excitation v and decreases as 1 / ρ 2 . Figure 10 shows the empirical relationship between an online excitation indicator (here: v reported by the estimator) and the information measure λ min ( J w ) in the window. A clear positive correlation is observed: larger transverse excitation translates into larger information in the “worst” direction, which directly supports uncertainty reduction.
Taken together, the information maps in Figure 3, Figure 4 and Figure 5 and the empirical relationship in Figure 10 answer Q1: in Doppler-only relative navigation, finite-horizon informativeness is governed primarily by motion geometry, especially the magnitude and directional diversity of transverse excitation and the inter-vehicle range.

9.3. Q2: Strategy Comparison in Paired Monte Carlo Evaluation

Figure 11 compares the success rate and the distribution of time-to-success (for episodes that ended in success). The RL policy achieves 87.3% successes (262/300), whereas baseline reaches 57.7% (173/300), and baseline + exc 62.3% (187/300). Moreover, RL is clearly faster: the median time-to-success is 114 s , while for baseline and baseline + exc it is 202 s and 204 s , respectively. This means that the RL policy not only more often drives the system to a state satisfying the on-line success conditions, but does so in a shorter time.
Figure 12 shows the formation error computed on “truth”, e form , true = r true r des W . The RL policy reduces the median error faster and keeps it at a lower level throughout the episode. For successful episodes, the median final error on “truth” is 3.44 m (RL), compared to 4.55 m (baseline) and 4.53 m (baseline + exc). In the tail of the distribution (95th percentile) RL is also favorable (5.49 m vs. 6.10 m and 7.26 m), indicating more stable behavior under more challenging noise and geometry realizations.
Figure 13 shows the time evolution of the median windowed metrics: λ min ( J w ) and tr ( C w ) . For baseline and baseline + exc, we observe the typical Doppler-only degeneracy: after a transient phase (between approximately 100 s and 200 s ) the information in the worst direction drops by many orders of magnitude, and the corresponding CRLB bound grows rapidly. This is consistent with the mechanism described in Section 4: maintaining formation without active excitation can lead to v 0 and loss of identifiability. The RL policy maintains noticeably higher informativeness in the window, which translates into a lower CRLB bound and, consequently, a higher chance of meeting the success condition based on the estimate uncertainty.
Importantly, the CRLB curves in Figure 13 should be interpreted as best-case, geometry-driven reference bounds rather than as direct numerical predictions of EKF-bank covariance. The bound is computed from the idealized Doppler measurement model and assumed Doppler noise level, whereas the practical EKF bank operates with reconstructed v ˜ r e l and is additionally affected by process/model uncertainty, gating, linearization error, and posterior multimodality in the Doppler-only regime. Consequently, when the recent geometry becomes weakly informative, the practical filter typically degrades more strongly than the CRLB alone would suggest: the collapse of λ min ( J w ) and the growth of tr ( C w ) coincide with slowed or stalled contraction of the EKF-bank uncertainty, rather than with a one-to-one quantitative match. Thus, in the present study the CRLB is used as an optimistic geometry-based benchmark, while the EKF-bank trajectories show how this information loss manifests itself in practical convergence behavior. Table 9 summarizes key metrics. Importantly, RL generates substantially larger transverse excitation (mean v ) than baseline, which is consistent with the scaling in (20) and explains the improvement in FIM/CRLB. Adding the excitation term in baseline + exc improves the results only marginally: we still observe a large fraction of steps with (practically) singular information in the window.
The Monte Carlo experiments (Section 8 and Section 9) confirm that Doppler-only estimation performance is governed by the time-varying measurement geometry. When the trajectory provides sustained transverse excitation and non-collinear excitation directions, the estimator uncertainty contracts and the position error remains bounded. Conversely, in weak-geometry episodes (e.g., nearly radial motion with v 0 ), Doppler updates contribute little information and the filter uncertainty and errors can grow rapidly. These trends are consistent with the sliding-window metrics: periods of low λ min ( J w ) (or large CRLB) align with increases of the empirical error across runs, whereas periods of high λ min ( J w ) align with stable error reduction.
These results answer Q2: compared with geometry-agnostic formation control, motion strategies that preserve informative geometry over a finite horizon lead to higher task success, faster convergence, and better information conditioning in the Doppler-only regime.

9.4. Case Study

To provide an illustration of the mechanism behind the Monte Carlo results from Section 9.3, we analyze below a single, representative episode (the same initial conditions and the same noise realization for all strategies, in accordance with Section 8.4). The selected episode is representative of the Doppler-only regime: the differences between the methods do not stem solely from the formation-task geometry, but primarily from how the controller affects the informativeness of the motion over a finite time window (cf. Section 4 and Section 5).
Figure 14 shows that all methods (except random control) can roughly bring the follower into the vicinity of the desired trajectory; however, the approach geometry is qualitatively different. The RL policy executes an approach with a clear transverse component (turn/arc), whereas baseline (P) and baseline + exc converge more “straight” towards the formation geometry and stabilize the heading relatively quickly. In Doppler-only, such an overly “calm” trajectory can be favorable for the formation itself, but informationally unfavorable, because it reduces the transverse excitation v and leads to episodes of weak observability over time.
Figure 15a–c illustrate the key Doppler-only conflict: a low formation error on “truth” does not guarantee low estimate uncertainty. In this episode, baseline (P) is able to reduce the formation error (Figure 15a), but at the same time, σ max ( t ) stabilizes at an elevated level and does not decrease to the values required by the online success criterion (Figure 15b). Baseline+exc improves the situation relative to plain P (the uncertainty is smaller), but it still remains in a “borderline” regime and does not achieve as stable an uncertainty reduction as RL.
This representative episode also makes the distinction between theoretical and practical performance more explicit. The deterioration of the information metrics in Figure 15c identifies the onset of a weak-geometry regime, but the EKF-bank response in Figure 15b is more severe than the optimistic bound alone would suggest: once the recent trajectory ceases to provide informative transverse excitation, the practical filter uncertainty no longer contracts effectively and may remain elevated because the estimator is additionally affected by input reconstruction errors, linearization, and Doppler-only mode ambiguity.
The mechanism is visible directly in Figure 15c. In line with the information scaling (20) and the Jacobian interpretation (18), the key is to maintain the transverse component v (here: v ¯ ). For baseline (P) and baseline + exc, after the transient phase, the transverse excitation practically vanishes ( v ¯ 0 ), and λ min ( J w ) drops to the level of numerical degeneracy. Consequently, within the window T w the estimation problem becomes ill-conditioned and the filter has no basis to further reduce the uncertainty. The RL policy maintains markedly larger transverse excitation; as a result, λ min ( J w ) stays significantly higher, and the estimate uncertainty decreases to a level that allows the episode to end in success (the RL curves terminate in Figure 15b,c).
The differences in excitation are directly visible in the follower motion signals (Figure 15d–f). The RL policy maintains a high speed in the approach phase and performs pronounced heading maneuvers (nonzero ω F over a substantial part of the episode), which translates into maintaining v . Baseline (P) and baseline + exc, after a short transient, stabilize the course (practically ω F 0 ), which in Doppler-only promotes “settling” into a weakly informative geometry (radially dominated motion and/or lack of excitation direction variability), and thus leads to a drop in λ min ( J w ) in the window. Random control does generate strong, stochastic heading changes, but it does not accomplish the formation objective (the error grows in Figure 15a) and is not a practical alternative.
In summary, the case study confirms the central thesis of the paper: in Doppler-only, success is determined not only by “moving toward the formation”, but also by whether the trajectory provides persistent geometric excitation in the sense of Section 4. The RL policy behaves like a controller that actively avoids information degeneracy: it generates maneuvers that maintain v and thereby enable uncertainty reduction over a finite horizon, which translates into meeting the success criterion defined solely from online signals.

10. Discussion

The presented results confirm that in the Doppler-only regime (range-rate without range), the observability of the relative position is not an inherent, time-invariant property of the system, but rather a property of the trajectory and geometry over a finite horizon. The information maps based on the sliding-window Fisher information matrix and the CRLB bound (Figure 3, Figure 4 and Figure 5) accurately predict in which initial configurations and for which motion patterns episodes of weak information are likely to occur. Two degeneration mechanisms are particularly important, and both follow directly from the structure of the Doppler measurement Jacobian: the vanishing of information for (almost) radial motion, when v 0 , and the decay of information with increasing range, due to the 1 / ρ 2 scaling. In practice, this means that even a correctly operating estimator may periodically “lose ground” during motion phases that do not provide sufficiently diverse geometric excitation, and the problem cannot be resolved solely by filter tuning.
The comparison of control strategies highlights the consequences of this phenomenon in the leader–follower task. The baseline controller, whose natural objective is to minimize the formation error, tends to reduce the relative speed as well as its transverse component, which improves offset keeping but simultaneously reduces the Doppler-only information content in the time window and can lead to increased estimation uncertainty. In the Monte Carlo results, this effect appears as a decrease in λ min ( J w ) and an increase in tr ( C w ) for baseline/baseline + exc (Figure 13), and consequently as a lower success rate and longer time to reach the success condition (Figure 11, Table 9). In contrast, the RL policy, trained with a penalty on uncertainty and a bonus for transverse excitation, generates maneuvers that keep v at a higher level. This is consistent with both the theoretical information scaling and the empirical relationship between v and λ min ( J w ) (Figure 10). In this sense, RL does not “bypass” Doppler-only limitations; instead, it learns a practical trade-off between stabilizing the formation and actively maintaining informative conditions.
At the same time, the FIM/CRLB measures used in the analysis are intentionally idealized and should be interpreted as geometry-driven best-case references rather than as direct predictors of EKF-bank covariance. These bounds follow from the Doppler measurement model and the assumed Doppler noise level, with v r e l treated as known, whereas in the online loop, it is reconstructed from the follower dead reckoning and may be erroneous. In addition, the practical EKF bank is affected by process/model mismatch, gating, linearization, and the multimodal ambiguity characteristic of the Doppler-only regime. For this reason, weak-geometry episodes can degrade the practical filter more severely than the CRLB alone would suggest. The value of the comparison is therefore primarily diagnostic: the FIM/CRLB traces indicate when the recent trajectory ceases to be sufficiently informative, while the EKF-bank behavior shows how this information loss translates into slowed convergence or persistently elevated uncertainty in practice.
A concrete next step is to move from the present offline diagnostic use of FIM/CRLB to an online information-aware controller. For example, at control time t k , a future MPC formulation could optimize a control sequence u k : k + H 1 over a horizon H by minimizing a standard formation/effort cost augmented by an information term:
j = 0 H 1 r ^ k + j | k r des W ( t k + j ) Q 2 + u k + j | k R 2 + w I Φ J ^ w , k + H | k ,
where J ^ w , k + H | k is a predicted sliding-window information matrix computed from the estimated state, the reconstructed relative velocity, and the predicted geometry under the candidate control sequence. If the goal is to prevent loss of information in the worst direction, one may choose:
Φ ( J ^ w ) = max 0 , λ tar λ min ( J ^ w ) 2 .
If a smoother criterion is preferred, one may instead use:
Φ ( J ^ w ) = tr ( J ^ w + ϵ I 2 ) 1
or
Φ ( J ^ w ) = log det ( J ^ w + ϵ I 2 ) .
In all three cases, the controller would trade off formation keeping against predicted future informativeness and would activate bounded arc-like or lateral maneuvers only when the predicted information level becomes too low. Importantly, unlike the offline information maps used in the present study, such a controller would evaluate the information metric online from estimated quantities rather than from ground truth.
The results are also strongly related to the ambiguity mechanism characteristic of Doppler-only. Using an EKF bank together with tempering and rejuvenation improves stability in situations where information is periodically weak and individual samples do not resolve competing hypotheses. However, this comes at a higher computational cost and sensitivity to parameter choices (number of hypotheses, ESS threshold, jitter/inflation). In practical underwater systems, an additional limitation can be the irregularity of the acoustic channel (dropouts, delays, Doppler outliers), which is not fully modeled in this work. For this reason, a natural next step is to extend the simulation environment with more realistic measurement and communication models and to assess to what extent the geometric degeneracy mechanisms described here persist under non-Gaussian disturbances.
A final important limitation is the use of a 2D model and the assumption of known leader velocity, which simplifies both filtering and the information analysis. Extending the framework to 3D and accounting for uncertainty in the leader motion model (e.g., via joint estimation or by treating the leader velocity as an uncertain parameter) would increase realism and could change the quantitative values of the information bounds. Regardless of these limitations, the qualitative conclusions remain clear: in Doppler-only, actively shaping the geometry is essential, and finite-horizon information content provides a practical “bridge” between observability theory and control design.

11. Conclusions

This work analyzed relative navigation in a leader–follower setting for the Doppler-only case, where the only acoustic observation is the range-rate. The main conclusion is that, in this regime, finite-horizon estimation performance is governed primarily by motion geometry rather than by the measurement model alone. A single Doppler sample contributes at most rank-one information, and effective reduction of relative-position uncertainty requires sufficiently strong and directionally diverse transverse excitation over time. The dependence of the Jacobian on the transverse component of the relative velocity and the corresponding information scaling as v 2 / ρ 2 explain why predominantly radial motion and large inter-vehicle separation lead to rapid information loss.
To quantify this effect, we introduced sliding-window information measures based on the Fisher information matrix and the Cramér–Rao lower bound, and we used them to construct information maps for representative maneuvers. In the present formulation, these maps are not an online input to the estimator or controller; rather, they serve as an offline diagnostic and design-support tool. Their role is to identify informative and degenerate geometry regimes, to compare maneuver classes in terms of finite-horizon informativeness, and to explain the behavior of the estimator and controller observed in simulation.
The simulation study supports both research questions posed in the Introduction. First, the information maps and sliding-window FIM/CRLB analysis show that finite-horizon informativeness in Doppler-only relative navigation is controlled mainly by transverse excitation, directional diversity, and range. Second, the comparison of motion strategies shows that control policies that preserve informative geometry can substantially improve task performance relative to geometry-agnostic formation control. In particular, the RL policy achieved a higher success rate and shorter time to success than the baseline controller, while maintaining stronger transverse excitation and better information conditioning in the Doppler-only regime.
From a practical perspective, the results do not imply that a UUV must oscillate continuously. Rather, they show that the controller should avoid long intervals of weak geometry when estimator uncertainty is high. In practice, the required excitation may be generated by occasional turns, arc-like approach maneuvers, or temporary lateral corrections. This interpretation makes the notion of persistent excitation compatible with realistic leader–follower tasks.
The present study also has clear limitations. The analysis is based on a 2D model, assumes the leader velocity is externally available, and uses FIM/CRLB mainly as a geometry-driven diagnostic bound rather than as a full predictor of EKF-bank error under all model mismatches. Future work should extend the framework to 3D, include uncertainty in the leader motion model, and incorporate more realistic acoustic-channel effects such as dropouts, delays, and outliers. Another promising direction is to incorporate information measures directly into controller design. More concretely, a future MPC formulation could augment the standard formation-keeping and control-effort objective with a penalty based on the predicted finite-horizon information level, for example, through a term such as max 0 , λ tar λ min ( J ^ w ) 2 , or alternatively a CRLB-based criterion such as tr ( J ^ w + ϵ I 2 ) 1 , or a smoother D-optimality-type term such as log det ( J ^ w + ϵ I 2 ) , all evaluated online from the estimated geometry over the prediction horizon. A lighter-weight alternative would be to trigger bounded tangential excitation only when the predicted information level falls below a threshold. Such extensions would preserve the interpretability of the FIM/CRLB viewpoint while making the estimator–controller coupling explicit in real time.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author. The data and code repository is available at https://github.com/lukisp2/RL_UUV, accessed on 8 April 2026. A video file with three representative episodes is also available in this repository.

Conflicts of Interest

The author declares no conflicts of interest.

References

  1. Naus, K.; Marchel, L.; Szymak, P.; Nowak, A. Assessment of the Accuracy of Determining the Angular Position of the Unmanned Bathymetric Surveying Vehicle Based on the Sea Horizon Image. Sensors 2019, 19, 4644. [Google Scholar] [CrossRef] [PubMed]
  2. Kinsey, J.C.; Eustice, R.M.; Whitcomb, L.L. A Survey of Underwater Vehicle Navigation: Recent Advances and New Challenges. In Proceedings of the IFAC Conference of Manoeuvring and Control of Marine Craft, Lisbon, Portugal, 20–22 September 2006. [Google Scholar]
  3. Tan, H.; Diamant, R.; Seah, W.K.G.; Waldmeyer, M. A survey of techniques and challenges in underwater localization. Ocean Eng. 2011, 38, 1663–1676. [Google Scholar] [CrossRef]
  4. Paull, L.; Saeedi, S.; Seto, M.; Li, H. AUV Navigation and Localization: A Review. IEEE J. Ocean. Eng. 2014, 39, 131–149. [Google Scholar] [CrossRef]
  5. Chen, L.; Wang, S.; McDonald-Maier, K.; Hu, H. Towards autonomous localization and mapping of AUVs: A survey. Int. J. Intell. Unmanned Syst. 2013, 1, 97–120. [Google Scholar] [CrossRef]
  6. Bahr, A.; Leonard, J.J.; Fallon, M.F. Cooperative localization for autonomous underwater vehicles. Int. J. Robot. Res. 2009, 28, 714–728. [Google Scholar] [CrossRef]
  7. Papadopoulos, G.; Fallon, M.F.; Leonard, J.J.; Patrikalakis, N.M. Cooperative Localization of Marine Vehicles using Nonlinear State Estimation. In Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems; IEEE: New York, NY, USA, 2010. [Google Scholar] [CrossRef]
  8. Fallon, M.F.; Papadopoulos, G.; Leonard, J.J.; Patrikalakis, N.M. Cooperative AUV navigation using a single maneuvering surface craft. Int. J. Robot. Res. 2010, 29, 1461–1474. [Google Scholar] [CrossRef]
  9. Allotta, B.; Costanzi, R.; Meli, E.; Pugi, L.; Ridolfi, A.; Vettori, G. Cooperative localization of a team of AUVs by a tetrahedral configuration. Robot. Auton. Syst. 2014, 62, 1228–1237. [Google Scholar] [CrossRef]
  10. Wang, S.; Chen, L.; Gu, D.; Hu, H. Cooperative Localization of AUVs Using Moving Horizon Estimation. IEEE/CAA J. Autom. Sin. 2014, 1, 68–76. [Google Scholar] [CrossRef]
  11. Gadre, A.S.; Stilwell, D.J. A Complete Solution to Underwater Navigation in the Presence of Unknown Currents Based on Range Measurements from a Single Location. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems; IEEE: New York, NY, USA, 2005; pp. 1420–1425. [Google Scholar] [CrossRef]
  12. Olson, E.; Leonard, J.J.; Teller, S. Robust Range-Only Beacon Localization. IEEE J. Ocean. Eng. 2006, 31, 949–958. [Google Scholar] [CrossRef]
  13. McPhail, S.D.; Pebody, M. Range-Only Positioning of a Deep-Diving Autonomous Underwater Vehicle From a Surface Ship. IEEE J. Ocean. Eng. 2009, 34, 669–677. [Google Scholar] [CrossRef]
  14. Hegrenæs, Ø.; Gade, K.; Hagen, O.K.; Hagen, P.E. Underwater transponder positioning and navigation of autonomous underwater vehicles. In Proceedings of the IEEE OCEANS Conference, Biloxi, MS, USA, 26–29 October 2009. [Google Scholar]
  15. Lee, P.M.; Jun, B.H. Pseudo long base line navigation algorithm for underwater vehicles with inertial sensors and two acoustic range measurements. Ocean Eng. 2007, 34, 416–425. [Google Scholar] [CrossRef]
  16. Webster, S.E.; Eustice, R.M.; Singh, H.; Whitcomb, L.L. Preliminary Deep Water Results in Single-Beacon One-Way-Travel-Time Acoustic Navigation for Underwater Vehicles. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems; IEEE: New York, NY, USA, 2009; pp. 2053–2060. [Google Scholar] [CrossRef]
  17. Eustice, R.M.; Singh, H.; Whitcomb, L.L. Synchronous-Clock, One-Way-Travel-Time Acoustic Navigation for Underwater Vehicles. J. Field Robot. 2011, 28, 121–136. [Google Scholar] [CrossRef]
  18. Webster, S.E.; Eustice, R.M.; Singh, H.; Whitcomb, L.L. Advances in single-beacon one-way-travel-time acoustic navigation for underwater vehicles. Int. J. Robot. Res. 2012, 31, 935–950. [Google Scholar] [CrossRef]
  19. Antonelli, G.; Arrichiello, F.; Chiaverini, S.; Sukhatme, G.S. Observability Analysis of Relative Localization for AUVs Based on Ranging and Depth Measurements. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation; IEEE: New York, NY, USA, 2010; pp. 4276–4281. [Google Scholar] [CrossRef]
  20. Bayat, M.; Aguiar, A.P. Observability Analysis for AUV Range-only Localization and Mapping Measures of Unobservability and Experimental Results. IFAC Proc. Vol. 2012, 45, 325–330. [Google Scholar] [CrossRef]
  21. Crasta, N.; Bayat, M.; Aguiar, A.P.; Pascoal, A. Observability analysis of 2D single-beacon navigation in the presence of constant currents for two classes of maneuvers. IFAC Proc. Vol. 2013, 46, 227–232. [Google Scholar] [CrossRef]
  22. Quenzer, B.; Morgansen, K. Observability based control in range-only underwater vehicle localization. In 2014 American Control Conference; IEEE: New York, NY, USA, 2014; pp. 4702–4707. [Google Scholar] [CrossRef]
  23. Hinson, B.; Binder, B.; Morgansen, K. Path planning to optimize observability in a planar uniform flow field. In 2013 American Control Conference; IEEE: New York, NY, USA, 2013; pp. 1392–1399. [Google Scholar] [CrossRef]
  24. Arrichiello, F.; Antonelli, G.; Aguiar, A.P.; Pascoal, A. An Observability Metric for Underwater Vehicle Localization Using Range Measurements. Sensors 2013, 13, 16191–16215. [Google Scholar] [CrossRef]
  25. Parlangeli, G.; Indiveri, G. Single range observability for cooperative underactuated underwater vehicles. Annu. Rev. Control 2015, 40, 129–141. [Google Scholar] [CrossRef]
  26. Rúa, S.; Vásquez, R.E.; Crasta, N.; Betancur, M.J.; Pascoal, A. Observability analysis for a cooperative range-based navigation system that uses a rotating single beacon. Ocean Eng. 2022, 248, 110697. [Google Scholar] [CrossRef]
  27. Gong, Z.; Li, C.; Jiang, F. AUV-Aided Joint Localization and Time Synchronization for Underwater Acoustic Sensor Networks. IEEE Signal Process. Lett. 2018, 25, 477–481. [Google Scholar] [CrossRef]
  28. Gong, Z.; Li, C.; Su, R. Fundamental Limits of Doppler Shift-Based, ToA-Based, and TDoA-Based Underwater Localization. IEEE/CAA J. Autom. Sin. 2023, 10, 1637–1639. [Google Scholar] [CrossRef]
  29. Gong, Z.; Li, C.; Jiang, F.; Zheng, J. AUV-Aided Localization of Underwater Acoustic Devices Based on Doppler Shift Measurements. IEEE Trans. Wirel. Commun. 2020, 19, 2226–2239. [Google Scholar] [CrossRef]
  30. Carroll, P.; Domrese, K.; Zhou, H.; Zhou, S.; Willett, P. Doppler-aided localization of mobile nodes in an underwater distributed antenna system. Phys. Commun. 2016, 18, 49–59. [Google Scholar] [CrossRef]
  31. Su, R.; Gong, Z.; Li, C.; Han, S. High Accuracy AUV-Aided Underwater Localization: Far-Field Information Fusion Perspective. IEEE Trans. Signal Process. 2024, 72, 1877–1891. [Google Scholar] [CrossRef]
  32. Harris, Z.J.; Whitcomb, L.L. Cooperative acoustic navigation of underwater vehicles without a DVL utilizing a dynamic process model: Theory and field evaluation. J. Field Robot. 2021, 38, 700–726. [Google Scholar] [CrossRef]
  33. Li, Y.; Wang, Y.; Yu, W.; Guan, X. Multiple Autonomous Underwater Vehicle Cooperative Localization in Anchor-Free Environments. IEEE J. Ocean. Eng. 2019, 44, 895–911. [Google Scholar] [CrossRef]
  34. Li, Y.; Yu, W.; Guan, X. Current-Aided Multiple-AUV Cooperative Localization and Target Tracking in Anchor-Free Environments. IEEE/CAA J. Autom. Sin. 2023, 10, 792–806. [Google Scholar] [CrossRef]
  35. Kay, S.M. Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory; Prentice Hall: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
  36. Van Trees, H.L.; Bell, K.L. Detection, Estimation, and Modulation Theory, Part I: Detection, Estimation, and Linear Modulation Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2001. [Google Scholar]
  37. Tan, Y.T.; Gao, T.; Chitre, M. Cooperative Path Planning for Range-Only Localization Using a Single Moving Beacon. IEEE J. Ocean. Eng. 2014, 39, 371–385. [Google Scholar] [CrossRef]
  38. Masmitja, I.; Gomariz, S.; Del-Rio, J.; Kieft, B.; O’Reilly, T.; Bouvet, P.; Aguzzi, J. Optimal path shape for range-only underwater target localization using a wave glider. Int. J. Robot. Res. 2018, 37, 1447–1462. [Google Scholar] [CrossRef]
  39. Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of Machine Learning Research, Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; ML Research Press: Cambridge, MA, USA, 2018; Volume 80, pp. 1861–1870. [Google Scholar]
  40. Marchel, Ł.; Kot, R.; Szymak, P.; Piskur, P. Model-Based AUV Path Planning Using Curriculum Learning and Deep Reinforcement Learning on a Simplified Electronic Navigation Chart. Appl. Sci. 2025, 15, 6081. [Google Scholar] [CrossRef]
  41. Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement Learning in Robotics: A Survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef]
  42. Placed, J.; Castellanos, J.A. A Survey on Active SLAM. Appl. Sci. 2020, 10, 8386. [Google Scholar] [CrossRef]
  43. Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
  44. Zhu, R.; Li, W.; Boukerche, A.; Yang, Q. Energy-Aware DRL-Based Dual-Perception Fountain Codes for Resource-Constrained UASNs. IEEE Trans. Sustain. Comput. 2026, 11, 111–122. [Google Scholar] [CrossRef]
Figure 1. Geometry and notation of the leader–follower problem in the horizontal plane. F W denotes the inertial/world frame (ENU), and F L is the leader-fixed frame aligned with the leader heading ψ L . The follower and leader positions are p F and p L , the relative position is r = p L p F , the range is ρ = r , and the line-of-sight unit vector is u = r / ρ . The vehicle velocities are v F and v L , the relative velocity is v r e l = v L v F , and the desired formation offset is defined as r d e s L in F L and represented in the world frame as r d e s W = R ( ψ L ) r d e s L .
Figure 1. Geometry and notation of the leader–follower problem in the horizontal plane. F W denotes the inertial/world frame (ENU), and F L is the leader-fixed frame aligned with the leader heading ψ L . The follower and leader positions are p F and p L , the relative position is r = p L p F , the range is ρ = r , and the line-of-sight unit vector is u = r / ρ . The vehicle velocities are v F and v L , the relative velocity is v r e l = v L v F , and the desired formation offset is defined as r d e s L in F L and represented in the world frame as r d e s W = R ( ψ L ) r d e s L .
Applsci 16 03758 g001
Figure 2. Illustration of Doppler-only geometry sensitivity in 2D. The LOS vector is u = r / ρ , where r = p L p F and ρ = r . The transverse component v = ( I 2 u u ) v r e l , with magnitude v = v , determines the Jacobian norm H 2 = v / ρ . (a) In nearly radial motion, v r e l u , hence v 0 and H 0 ; (b) with significant transverse excitation, v > 0 , the sensitivity increases and the measurement carries information about r.
Figure 2. Illustration of Doppler-only geometry sensitivity in 2D. The LOS vector is u = r / ρ , where r = p L p F and ρ = r . The transverse component v = ( I 2 u u ) v r e l , with magnitude v = v , determines the Jacobian norm H 2 = v / ρ . (a) In nearly radial motion, v r e l u , hence v 0 and H 0 ; (b) with significant transverse excitation, v > 0 , the sensitivity increases and the measurement carries information about r.
Applsci 16 03758 g002
Figure 3. Number of used Doppler samples N u s e d within the window T w (triptych: straight/turn/sine). Angular axis: navigation bearing β 0 (0° = N, 90° = E, clockwise). Radial axis: initial range ρ 0 . Low N u s e d indicates weak/poor geometry due to gating (typically near-radial motion, insufficient relative speed, or insufficient transverse excitation), whereas high N u s e d indicates that informative samples are retained over most of the window.
Figure 3. Number of used Doppler samples N u s e d within the window T w (triptych: straight/turn/sine). Angular axis: navigation bearing β 0 (0° = N, 90° = E, clockwise). Radial axis: initial range ρ 0 . Low N u s e d indicates weak/poor geometry due to gating (typically near-radial motion, insufficient relative speed, or insufficient transverse excitation), whereas high N u s e d indicates that informative samples are retained over most of the window.
Applsci 16 03758 g003
Figure 4. Map of log 10 λ min ( J w ) / ( 1 m 2 ) for window T w (shared color scale across the three maneuvers). Larger values indicate better information conditioning in the worst direction, whereas very small values indicate poor geometry and near-singular finite-window information.
Figure 4. Map of log 10 λ min ( J w ) / ( 1 m 2 ) for window T w (shared color scale across the three maneuvers). Larger values indicate better information conditioning in the worst direction, whereas very small values indicate poor geometry and near-singular finite-window information.
Applsci 16 03758 g004
Figure 5. Map of log 10 tr ( C w ) / ( 1 m 2 ) for window T w (shared color scale across the three maneuvers). Smaller values correspond to a lower bound on the estimation error and hence better geometry, whereas large values indicate poor geometry and weak finite-window informativeness.
Figure 5. Map of log 10 tr ( C w ) / ( 1 m 2 ) for window T w (shared color scale across the three maneuvers). Smaller values correspond to a lower bound on the estimation error and hence better geometry, whereas large values indicate poor geometry and weak finite-window informativeness.
Applsci 16 03758 g005
Figure 6. High-level simulation–estimation–control loop used in the experiments. The simulator propagates the ground-truth leader and follower motion and generates the online signals processed by the EKF bank. The controller (baseline, baseline + excitation, or RL) uses only online quantities, i.e., the relative-state estimate, uncertainty, and estimator diagnostics. Truth-based errors and sliding-window FIM/CRLB measures are computed only in the evaluation branch and are not used by the controller or the reward.
Figure 6. High-level simulation–estimation–control loop used in the experiments. The simulator propagates the ground-truth leader and follower motion and generates the online signals processed by the EKF bank. The controller (baseline, baseline + excitation, or RL) uses only online quantities, i.e., the relative-state estimate, uncertainty, and estimator diagnostics. Truth-based errors and sliding-window FIM/CRLB measures are computed only in the evaluation branch and are not used by the controller or the reward.
Applsci 16 03758 g006
Figure 7. EKF bank at times t k 1 and t k after a few Doppler measurements.
Figure 7. EKF bank at times t k 1 and t k after a few Doppler measurements.
Applsci 16 03758 g007
Figure 8. Geometric interpretation of the baseline: a formation controller in velocity space and an excitation component tangential to the LOS, activated by an increase in estimation uncertainty.
Figure 8. Geometric interpretation of the baseline: a formation controller in velocity space and an excitation component tangential to the LOS, activated by an increase in estimation uncertainty.
Applsci 16 03758 g008
Figure 9. RL policy training (SAC) based on TensorBoard logs. From top: (a) mean episodic reward; (b) formation error computed on the estimate e k ; (c) estimate uncertainty as σ max , k = λ max ( P k ) ; (d) mean transverse excitation v ¯ (a proxy for the condition v > 0 required for Doppler-only informativeness). These curves are treated as training diagnostics; the actual performance evaluation and comparison to the baseline are reported in Section 8 and Section 9.
Figure 9. RL policy training (SAC) based on TensorBoard logs. From top: (a) mean episodic reward; (b) formation error computed on the estimate e k ; (c) estimate uncertainty as σ max , k = λ max ( P k ) ; (d) mean transverse excitation v ¯ (a proxy for the condition v > 0 required for Doppler-only informativeness). These curves are treated as training diagnostics; the actual performance evaluation and comparison to the baseline are reported in Section 8 and Section 9.
Applsci 16 03758 g009
Figure 10. Relationship between transverse excitation v (online signal from the estimator) and windowed information λ min ( J w ) (computed on ground truth for evaluation). The points come from all episodes and time steps (with cases ρ true ρ min masked out).
Figure 10. Relationship between transverse excitation v (online signal from the estimator) and windowed information λ min ( J w ) (computed on ground truth for evaluation). The points come from all episodes and time steps (with cases ρ true ρ min masked out).
Applsci 16 03758 g010
Figure 11. Comparison of control performance. (Left): success rate over 300 Monte Carlo episodes. (Right): empirical cumulative distribution function (ECDF) of time-to-success (computed only for episodes that ended in success).
Figure 11. Comparison of control performance. (Left): success rate over 300 Monte Carlo episodes. (Right): empirical cumulative distribution function (ECDF) of time-to-success (computed only for episodes that ended in success).
Applsci 16 03758 g011
Figure 12. Formation error computed on “truth”. (Left): median over time (shading: interquartile range (IQR)) across episodes active at a given time. (Right): ECDF of the final formation error for episodes that ended in success.
Figure 12. Formation error computed on “truth”. (Left): median over time (shading: interquartile range (IQR)) across episodes active at a given time. (Right): ECDF of the final formation error for episodes that ended in success.
Applsci 16 03758 g012
Figure 13. Information metrics over time (median/IQR (shading) across episodes active at a given time). (Left): λ min ( J w ) in the window. (Right): tr ( C w ) in the window. The logarithmic scales highlight episodes of information degeneracy in the Doppler-only regime.
Figure 13. Information metrics over time (median/IQR (shading) across episodes active at a given time). (Left): λ min ( J w ) in the window. (Right): tr ( C w ) in the window. The logarithmic scales highlight episodes of information degeneracy in the Doppler-only regime.
Applsci 16 03758 g013
Figure 14. Example episode: trajectories in F W . Shown are the leader position p L (black), the desired follower trajectory p F , des (gray dashed), and the actual follower trajectories p F for the compared strategies. Curves may terminate at the episode end (online success criterion met or timeout - both marked as ”x” sign).
Figure 14. Example episode: trajectories in F W . Shown are the leader position p L (black), the desired follower trajectory p F , des (gray dashed), and the actual follower trajectories p F for the compared strategies. Curves may terminate at the episode end (online success criterion met or timeout - both marked as ”x” sign).
Applsci 16 03758 g014
Figure 15. Example episode (Doppler-only): linking task quality, estimate uncertainty, information metrics, and follower motion signals for the compared strategies. (a) Formation error on “truth” e form , true ( t ) = r true r des W ; (b) online estimate uncertainty σ max ( t ) = λ max ( P ( t ) ) ; (c) informativeness and excitation: λ min ( J w ) (solid lines, left axis) and v ¯ (dashed lines, right axis); (d) follower speed v F ( t ) ; (e) follower course ψ F ( t ) ; (f) follower angular rate (heading change) ω F ( t ) .
Figure 15. Example episode (Doppler-only): linking task quality, estimate uncertainty, information metrics, and follower motion signals for the compared strategies. (a) Formation error on “truth” e form , true ( t ) = r true r des W ; (b) online estimate uncertainty σ max ( t ) = λ max ( P ( t ) ) ; (c) informativeness and excitation: λ min ( J w ) (solid lines, left axis) and v ¯ (dashed lines, right axis); (d) follower speed v F ( t ) ; (e) follower course ψ F ( t ) ; (f) follower angular rate (heading change) ω F ( t ) .
Applsci 16 03758 g015
Table 1. Positioning of the present work relative to the most relevant research lines.
Table 1. Positioning of the present work relative to the most relevant research lines.
Research LineRepresentative
References
Main
Observable(s)
Main Emphasis and Relation to the Present Paper
General underwater navigation and localization surveys[2,3,4,5]INS/DVL, acoustic aids, SLAMProvide broad background on underwater navigation architectures and trade-offs, but do not analyze Doppler-only leader–follower information limitations.
Cooperative underwater localization with maneuvering reference vehicles[6,7,8]Inter-vehicle acoustic aiding, mainly range-based cooperationDemonstrate the value of cooperative relative localization with limited acoustic links, including maneuvering aiding platforms, but do not focus on the Doppler-only finite-horizon geometry problem studied here.
Alternative cooperative configurations and estimators[9,10]Cooperative range-based measurementsExplore structured multi-AUV geometries and moving-horizon estimation, showing that estimator architecture matters; however, they do not isolate Doppler-only range-rate limitations.
Infrastructure-reduced single-beacon/ range-based navigation[11,12,13,14,15,16,17,18]Range, OWTT/TWTT, single-beacon measurementsReduce external infrastructure requirements, yet still rely on direct range or travel-time observables that provide stronger geometric constraints than Doppler-only range-rate.
Observability-aware relative/range-only localization[19,20,21,22,23,24,25,26]Range-only, single-range, range + depthClosest conceptual line of work: identifies geometry-induced degeneracies and motivates excitation-aware motion design; however, the measurement model differs fundamentally from Doppler-only sensing.
Doppler-based underwater localization and information limits[27,28,29,30,31]Doppler, Doppler-aided localization, ToA/TDoA comparisonsClosest sensing-modality line. These works establish the usefulness and limits of Doppler-based localization, but do not explicitly study the leader–follower Doppler-only formation problem with sliding-window FIM/CRLB maps and controller interpretation.
Recent cooperative/ anchor-free/current-aided underwater navigation[32,33,34]Dynamic-process models, cooperative ranging, current informationShow recent extensions of cooperative underwater navigation without full sensor suites or in anchor-free/current-aided settings; however, they rely on additional references or auxiliary information beyond isolated Doppler-only range-rate.
This workDoppler-only range-rateStudies leader–follower relative navigation without direct range, quantifies finite-horizon informativeness via sliding-window FIM/CRLB and information maps, and evaluates motion strategies that trade off formation keeping against estimator-informative excitation.
Table 2. Summary of core notation used in the model and the information analysis.
Table 2. Summary of core notation used in the model and the information analysis.
SymbolDescriptionUnit
F W Inertial/world frame (ENU)
F L Leader-fixed frame aligned with the leader heading ψ L
p L , p F Leader/follower position in F W m
r = p L p F Relative position vector (leader w.r.t. follower)m
r ^ Relative position estimatem
r d e s L Desired relative offset expressed in F L m
r d e s W Desired relative offset expressed in F W m
ρ = r Inter-vehicle range (distance)m
u = r / ρ LOS unit vector
v L , v F Leader/follower velocitiesm s−1
v r e l = v L v F Relative velocitym s−1
v ˜ F Noisy follower-velocity reconstruction from dead reckoningm s−1
v ˜ r e l Reconstructed relative velocity used by the estimator, v ˜ r e l = v L v ˜ F m s−1
ψ L , ψ F Leader/follower headingsrad
ρ ˙ Range-ratem s−1
ρ ˙ ¯ m Effective time-averaged range-rate over the integration window around t m m s−1
s m Doppler measurement (closing speed convention) at time t m m s−1
s m eff Effective Doppler measurement over the integration windowm s−1
σ s Standard deviation of Doppler measurement noisem s−1
σ s , filt Doppler measurement-noise standard deviation assumed by the filterm s−1
w k Process/model disturbance in the discrete relative-motion modelm
Q k Process-noise covariance in the discrete relative-motion modelm2
qProcess-noise intensity parameter, Q k = q 2 Δ t I 2 m s−1/2
ν m Additive Doppler measurement noise at time t m m s−1
ν m eff Effective measurement error absorbing receiver/channel nonidealitiesm s−1
h ( r , v r e l ) Doppler measurement functionm s−1
H = h / r Jacobian of the Doppler measurement w.r.t. rs−1
H m Jacobian H evaluated at measurement time t m s−1
v Transverse component of v r e l w.r.t. the LOS um s−1
v = v Magnitude of the transverse componentm s−1
J m Single-sample Fisher information incrementm−2
J w Sliding-window Fisher information matrix (FIM)m−2
C w CRLB bound, ( J w + ϵ I 2 ) 1 m2
R ( ψ ) 2D rotation matrix
I 2 2 × 2 identity matrix
Δ t Integration/discretization steps
T s Doppler sampling periods
T int Doppler receiver integration-window durations
T w Sliding-window horizons
t m Doppler measurement instant, t m = m T s s
W ( t ) Time window used in the sliding-window analysiss
ϵ CRLB regularization parameter
Table 3. Angle conventions used in the maps: mathematical angle φ (computations) and navigation bearing β (presentation).
Table 3. Angle conventions used in the maps: mathematical angle φ (computations) and navigation bearing β (presentation).
Direction φ (0° = E, CCW) β (0° = N, CW)
North ( + y )90°
East ( + x )90°
South ( y )−90° (or 270°)180°
West ( x )180° (or −180°)270°
Table 4. Gating criteria and derived quantities used in the information maps.
Table 4. Gating criteria and derived quantities used in the information maps.
ItemDefinitionValue/Threshold
Transverse component v v = v r e l ( u v r e l ) u
Jacobian H H = ( 1 / ρ ) v
Sample-usage indicator δ k δ k = 1 if gating conditions are met; otherwise δ k = 0
Gating: minimum range ρ k > ρ min ρ min = 20   m
Gating: minimum relative speed v r e l , k > v min v min = 0.05   m   s 1
Gating: minimum transverse excitation v , k > v , min v , min = 0.25   m   s 1
Number of used samples N u s e d = k = 0 K 1 δ k 0 N u s e d K
Table 5. Parameters used to generate the maps in Figure 3, Figure 4 and Figure 5.
Table 5. Parameters used to generate the maps in Figure 3, Figure 4 and Figure 5.
ParameterValueNotes
Window/horizon T w 30 s Window for J w , C w
Sampling period T s 1.0   s K = T w / T s = 30 samples
Doppler noise σ s 0.015   m   s −1Variance R = σ s 2
v r e l at t = 0 1.0   m   s −1 v 0 = [ v r e l , 0 ] (East at t = 0 )
turn: turn rate ω 8 ° s −1 v r e l ( t ) = R ( ω t ) v 0
Sine: amplitude A35° v r e l ( t ) = R ( A sin ( 2 π f t ) ) v 0
Sine: frequency f 0.05 Hz
Initial range grid ρ 0 25 m to 220 m N ρ = 100
Angular grid (computations) φ [−180°, 180°] N φ = 181 , step 2°
Angular grid (presentation) β 0 0 ° 360 ° Effectively N β = 180 (no duplicated endpoint)
Regularization ϵ 10 12 Stabilizes inversion of J w
Table 6. EKF-bank and rejuvenation parameters used in the experiments.
Table 6. EKF-bank and rejuvenation parameters used in the experiments.
ParameterValue
N8
τ 1.5
w min 1 × 10 4
α 0.002
η ESS 0.55
w trig 0.90
σ jit 6 m
σ inf 10 m
Table 7. Key simulation and noise parameters used in the Monte Carlo protocol (hard mode in evaluation).
Table 7. Key simulation and noise parameters used in the Monte Carlo protocol (hard mode in evaluation).
ParameterValue
Control step Δ t act 2 s
Integration step Δ t 0.1   s
Doppler measurement period T s 1 s
Episode time limit T max 400 s (200 control steps)
Doppler noise (truth) σ s 0.015 m s −1
Follower heading error (dead-reckoning) σ ψ 0.5°
Follower speed error (dead-reckoning) σ v 0.01 m s −1
Desired relative offset in F L : r des L [ 100 , 0 ] m
Initial range ρ 0 (hard)80 m to 220 m
Number of EKF-bank hypotheses N8
Table 8. Compared motion strategies and their role in the study.
Table 8. Compared motion strategies and their role in the study.
StrategyOnline InputsExcitation MechanismRole in the Study
Baseline (P) r ^ , P, and leader velocityNoneGeometry-agnostic formation-control reference
Baseline+exc r ^ , P, and leader velocityExplicit tangential excitation when uncertainty increasesHand-designed excitation-aware baseline
RL (SAC) r ^ , P, diagnostics, and leader velocityLearned from reward trade-offLearned excitation-aware policy
RandomNone beyond action boundsStochasticNon-intelligent reference
Table 9. Summary of the strategy comparison (300 Monte Carlo episodes). Success and t succ are computed using the online criterion (estimate and uncertainty). e form , true is the formation error on “truth” at the end of the episode (median over successful episodes). v ¯ is the average transverse excitation reported by the estimator (over steps with ρ true > ρ min ). Information metrics: median log 10 λ min ( J w ) and log 10 tr ( C w ) (over the same steps), and p deg is the fraction of steps with λ min ( J w ) 10 10 .
Table 9. Summary of the strategy comparison (300 Monte Carlo episodes). Success and t succ are computed using the online criterion (estimate and uncertainty). e form , true is the formation error on “truth” at the end of the episode (median over successful episodes). v ¯ is the average transverse excitation reported by the estimator (over steps with ρ true > ρ min ). Information metrics: median log 10 λ min ( J w ) and log 10 tr ( C w ) (over the same steps), and p deg is the fraction of steps with λ min ( J w ) 10 10 .
StrategySucc [%] t succ [s] e form , true [m] v ¯ [m/s]med log 10 λ min
( J w )
med log 10 tr
( C w )
p deg
RL (SAC)87.31143.441.26−1.731.730.20
Baseline (P)57.72024.550.34−15.939.300.52
Baseline+exc62.32044.530.37−14.659.000.51
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Marchel, Ł. Observability and Information Bounds in UUV Relative Navigation from Range-Rate. Appl. Sci. 2026, 16, 3758. https://doi.org/10.3390/app16083758

AMA Style

Marchel Ł. Observability and Information Bounds in UUV Relative Navigation from Range-Rate. Applied Sciences. 2026; 16(8):3758. https://doi.org/10.3390/app16083758

Chicago/Turabian Style

Marchel, Łukasz. 2026. "Observability and Information Bounds in UUV Relative Navigation from Range-Rate" Applied Sciences 16, no. 8: 3758. https://doi.org/10.3390/app16083758

APA Style

Marchel, Ł. (2026). Observability and Information Bounds in UUV Relative Navigation from Range-Rate. Applied Sciences, 16(8), 3758. https://doi.org/10.3390/app16083758

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop