1. Introduction
In ice-covered and otherwise challenging ocean environments, satellite navigation is largely unavailable and both geomagnetic cues and acoustic propagation can vary unpredictably [
1,
2]. During long missions and terminal docking, an autonomous underwater vehicle (AUV) must therefore rely primarily on onboard sensors—IMU, DVL, odometry, and a depth sensor—for inertial and relative navigation [
3,
4]. Although such proprioceptive measurements are high-rate and timely, they are vulnerable to inertial drift, attitude misalignment, and systematic sensor biases; errors accumulate with mission duration and often manifest at docking as systematic offsets in position and attitude [
5,
6]. By contrast, exteroceptive observations tied to a global reference—for example, relative range/bearing provided by a mobile docking station (MDS)—can bound this drift and are critical to achieving accurate and robust terminal alignment [
7,
8]. However, underwater acoustic links are low-bandwidth and slow, with significant delay and jitter [
9,
10]. As a result, station measurements often arrive late, out of order, or intermittently [
11,
12]. If used directly to correct the current state, such data induce time misalignment, estimator jumps, and loss of filter consistency, and can even trigger numerical instability [
11,
12,
13,
14].
Against this backdrop, cooperative docking between an AUV and a mobile docking station (MDS) is compelling [
15,
16]. The MDS supplies globally referenced observations that periodically recalibrate the AUV state, curbing long-horizon drift; at the terminal stage, maintaining both platforms’ estimates in a common reference frame reduces alignment error and improves convergence speed and stability [
7,
8,
17,
18,
19]. The challenge is to exploit the value of MDS observations under low-rate, unreliable communications without degrading the filter with delayed/out-of-sequence data, and to achieve joint estimation of both platforms in a unified frame while preserving closed-loop consistency.
Prior work largely follows three directions. First, for out-of-sequence measurements (OOSM), common strategies include timestamp alignment with “as-if-current” updates, offline smoothing (e.g., RTS smoothing), and history-based backward update with forward recomputation [
11,
12,
13,
14]. The first ignores covariance evolution and is prone to inconsistency; the second helps with timing but offers limited support for real-time closed-loop control. Second, asynchronous multi-sensor fusion often fixes the measurement covariance within an EKF framework or gates outliers with heuristic thresholds [
20]. In acoustic settings where delay and link quality vary over time, fixed covariances fail to reflect changing reliability, and fusion weights can become decoupled from true information content. Third, for cooperative localization and docking, many systems adopt a “loosely coupled” approach: the AUV and the MDS filter independently and only register trajectories or align coordinates near the end of the mission [
7,
8,
15,
16,
17,
18,
19]. This leaves relative drift to accumulate and makes closed-loop, real-time consistency at docking difficult.
To address these issues, we propose a practical cooperative estimation framework for ocean deployment that integrates a delay-compensated extended Kalman filter (DC-EKF) and a collaborative augmented EKF (Co-Aug-EKF). The core ideas are: (i) maintain a ring buffer of historical states and covariances; when a delayed measurement arrives, perform a rigorous, consistency-preserving update at its original timestamp, then forward-replay the corrected state and covariance to the present, absorbing “late information” without breaking consistency; (ii) introduce a delay threshold and dynamic confidence allocation: measurements exceeding a benefit–cost knee are softly down-weighted or discarded; within the threshold, adapt the measurement covariance using innovation statistics (e.g., Mahalanobis distance/normalized innovation squared, NIS) and link-quality proxies (delay magnitude/jitter) so that fusion weights track information quality; (iii) formulate a cooperative augmented model that estimates AUV and MDS states in a unified frame, explicitly embedding relative kinematics and bidirectional observations in a single filter loop to suppress relative drift and improve alignment consistency at docking.
Relative to existing approaches, our contributions are threefold. First, the DC-EKF handles OOSM in real time, avoiding the inconsistency of treating delayed measurements as current; numerical stability is maintained via Joseph-form covariance updates and spectral projection when needed. Second, we develop dynamic, statistics-driven confidence weighting so that fusion weights adapt to both link delay and residual quality, balancing robustness with information efficiency. Third, the Co-Aug-EKF unifies the two platforms’ states and relative observations in one estimator, enabling closed-loop consistent, real-time cooperative estimation and providing docking-ready aligned states. The framework exposes a clear engineering interface: MDS observations are first time-rectified and consistently absorbed by the DC-EKF, then passed as observations into the collaborative augmented filter, forming an end-to-end loop from delay robustness to coordinate unification.
All numerical simulations and analyses were implemented in Python (version 3.10) with NumPy (version 1.24). The main contributions are summarized as follows:
- (1)
A DC-EKF for real-time delay compensation that absorbs delayed/out-of-sequence measurements via ring-buffered backward updates and forward replay under filter-consistency constraints, preventing time-misalignment jumps.
- (2)
A delay threshold with dynamic confidence assignment that adapts measurement covariance using innovation statistics and link-delay features, yielding fusion weights matched to information quality with robust gating.
- (3)
A Co-Aug-EKF that achieves closed-loop consistent estimation of AUV–MDS states in a unified frame under relative kinematic constraints, significantly reducing terminal relative drift.
- (4)
An engineering pathway and consistency checks (NIS/NEES), together with numerical stabilization strategies, that support reliable online operation.
The remainder of this paper is organized as follows.
Section 2 reviews related work on AUV navigation, underwater acoustic localization, delayed/OOS measurement handling, and cooperative estimation.
Section 3 specifies the cooperative AUV–MDS system, motion and measurement models, and the noise/delay assumptions, and formulates the estimation problem.
Section 4 describes the DC-EKF and ring-buffer-based delay compensation.
Section 5 introduces delay threshold calibration and adaptive confidence weighting.
Section 6 presents the Co-Aug-EKF and cooperative docking experiments together with a scalability discussion.
Section 7 summarizes the main findings, discusses practical limitations, and outlines future work.
In summary, the objective is a real-time estimation scheme for AUV–MDS cooperative docking in GNSS-denied ocean environments that jointly achieves delay robustness and closed-loop consistency. By consistently absorbing delayed/out-of-sequence data, modeling measurement reliability adaptively, and estimating both platforms in an integrated fashion, the proposed methods strike a robust balance between practicality and estimator credibility, offering a reusable pathway to high-precision docking in complex ocean conditions.
2. Related Work
AUV state estimation has long relied on strapdown inertial navigation (SINS/IMU), DVL, depth, and odometry within a combined navigation framework, where Bayesian estimators—most prominently the extended Kalman filter (EKF)—dominate sea trials and engineering systems [
21,
22,
23]. Typical designs constrain kinematic evolution through the process model, correct IMU drift using DVL/depth measurements, and tune measurement covariances across operating conditions to trade off robustness and convergence speed [
23,
24]. To better cope with strong nonlinearity, attitude coupling, and error propagation, recent work has introduced Lie-group/Lie-algebra error formulations and invariant filtering to reduce linearization sensitivity and improve attitude consistency [
21,
22,
25,
26]. Mechanisms for event-triggered updates and correlated-noise handling have been explored to enhance disturbance rejection [
27,
28]. Data-driven components have also been used to complement the model—for example, learning DVL-related dynamics to improve short-horizon prediction and inference in complex waters [
29,
30,
31]. Overall, the trend is from a single EKF configuration toward a composite paradigm that blends stronger modeling with robustness mechanisms and selective data-driven augmentation [
21,
27,
29].
For multi-sensor fusion, EKF/UKF and information filters are widely used to incorporate heterogeneous measurements in a unified state space [
26]. System-level challenges include time synchronization, clock drift, and asynchronous sampling; engineering practice therefore adds gating (Mahalanobis distance/chi-square thresholds) and consistency checks (NIS/NEES) to mitigate mismatches [
24,
32]. In underwater settings, fixed measurement covariances often fail to capture time-varying quality, pushing the field toward adaptive “dynamic confidence” weighting driven by innovation statistics, geometry (range/field-of-view changes), or link-quality proxies, to balance stability and information use [
24,
33]. Back-end methods based on graph optimization/smoothing (batch or sliding-window) have also been applied to absorb irregularly timed measurements and improve global consistency in offline or near-real-time modes; under low-bandwidth, high-delay acoustic links, their computational and communication costs must be weighed carefully [
34,
35]. In short, fusion research has moved from idealized assumptions—synchronous, homogeneous, fixed confidence—to realistic constraints: asynchronous, heterogeneous, and time-varying confidence [
24,
26,
32].
Acoustic communication is a key external factor shaping fusion strategies [
9,
36]. Compared with radio, underwater acoustic links have orders-of-magnitude lower bandwidth, data rates on the order of kilobits per second, and slow propagation. Information experiences significant delay, jitter, and instability; on larger-scale networks, propagation delay depends on medium and range and varies with environment and mission, defying a single universal model [
9,
36,
37]. These properties directly exacerbate delayed, out-of-order, and intermittent measurements [
9,
32]. Any online estimator that seeks to exploit “global” external observations must therefore account for low-speed, uncertain links, or it will inevitably suffer severe time-misalignment errors and inconsistent updates [
14,
32].
For delayed and out-of-sequence data, target tracking and navigation offer a well-developed toolbox [
14,
32,
38]. Early work derived consistent update rules and first-order optimal approximations for delayed measurements: perform the measurement update at the original measurement time, then propagate the corrected state and covariance forward to the present, thereby avoiding the inconsistency of treating late data as current; this extends naturally to multi-step delays [
14,
38]. Subsequent studies trade accuracy for latency via model switching, smoothing, and approximate recomputation [
14,
32]. More recent efforts propose simplified backward-propagation/forward-replay variants of the Kalman filter, or “delay-aware” EKFs in robotic navigation for remote or lagged measurements [
14,
32,
38]. In essence, OOSM handling has evolved from offline smoothing to real-time backward–forward frameworks suitable for engineering systems.
For cooperative localization and docking, traditional long-baseline (LBL), short-baseline (SBL), and ultra-short-baseline (USBL) systems provide mature external positioning in multi-AUV/base configurations [
39,
40,
41]. In more flexible missions, mobile beacons/stations can yield more favorable geometry and improve access to global references. Distributed or centralized EKF-based frameworks may absorb relative measurements in a loosely coupled fashion within separate filters, or tightly couple/augment the states in a unified estimator that embeds relative kinematics, thereby suppressing relative drift and improving terminal accuracy [
40,
41]. Prior studies indicate that single-beacon (SLBL) and mobile-LBL concepts can enable effective acoustic guidance and homing with minimal infrastructure, and distributed acoustic navigation shows promise for sharing information and improving global consistency across platforms. For the strongly coupled docking task, however, unified augmented estimation is typically better suited to achieving closed-loop consistent alignment of pose and trajectory [
39,
41].
In summary, three lines of work have pushed the frontier: (i) stronger modeling and filtering for AUV navigation under complex flow and sensor degradation, (ii) robustness and adaptive weighting for heterogeneous, asynchronous fusion, and (iii) backward–forward OOSM handling and augmented cooperative estimation frameworks. What remains underexplored is combining severe acoustic delay/out-of-sequence effects, online adaptive confidence, and the unified-frame, closed-loop estimation required at docking within a single real-time system. Existing solutions often lack a clean interface between delay compensation and cooperative augmentation, or rely on static weighting that mismatches link quality and innovation statistics. This motivates an integrated method that connects consistent delay absorption (backward–forward replay), dynamic confidence weighting (driven by NIS/delay/observation conditions), and cooperative state–observation augmentation (closed-loop estimation in a unified frame) to move AUV–MDS cooperative docking from proof of concept toward engineering deployment.
3. System Model and Problem Formulation
This section specifies the cooperative AUV–mobile docking station (MDS) setup, the reference frames, motion and measurement models, and the delay/out-of-sequence (OOS) characteristics induced by the acoustic link. On this basis we formalize the state-estimation problem that underpins both the DC-EKF and the Co-Aug-EKF.
3.1. AUV–MDS System Architecture
The system consists of one AUV and one maneuverable MDS. The AUV carries onboard sensors (IMU, DVL, depth) that provide high-rate but drifting proprioceptive measurements. The MDS carries an acoustic positioning device (e.g., USBL-like range/bearing sensing) that yields globally referenced relative observations of the AUV. The two platforms exchange measurements and lightweight metadata (timestamps, quality indicators) over an acoustic link characterized by low bandwidth, significant propagation delay and jitter, and occasional OOS arrivals. Control inputs and mission planning are not the focus here; we treat “known or measurable body-frame commands” as exogenous inputs to the kinematic models.
To keep the system model concrete, we instantiate it with the parameters of a typical large, ice-capable survey-class AUV and a matching mobile docking station. Vehicles in the Autosub family have also been studied from an engineering-reliability perspective (e.g., formal-methods-based automated diagnosis), which highlights practical constraints in long-duration missions and motivates the representative platform assumptions adopted here [
42]. Recent reviews further summarize the unique challenges and applications of polar AUV operations, supporting the “ice-capable, survey-class” mission envelope considered in this work [
43]. On the docking-infrastructure side, docking-station concepts for 21-inch-class AUVs have been developed and ocean-tested, informing the representative docking-port geometry and homing-stage assumptions used in our simulations [
44]. In the terminal meters, robust visual guidance and positioning using dedicated light/marker arrangements have also been reported, consistent with our assumption that short-range optical aids can refine final alignment when local visibility permits [
45]. In addition, surveys on low-cost underwater sensing and communication platforms motivate modeling the station-side sensing as an acoustic range/bearing device with lightweight metadata exchange under constrained links [
46]. The AUV is representative of vehicles such as Autosub 6000 and other deep-diving 21-inch-class AUVs used for polar and deep-sea missions [
47,
48]. These vehicles are about 5–6 m long, 0.7–0.9 m in diameter, and have dry masses on the order of 1–2 t with depth ratings of several thousand meters. In our simulations, the AUV is modeled as a 5.5 m long, 0.9 m diameter hull with a dry mass of about 1.8 t, a cruise speed of 1.5 m/s (≈3 kn), and a maximum operating depth of 6000 m. The onboard navigation package comprises a tactical-grade IMU, a 300–600 kHz DVL, and a pressure depth sensor. The IMU provides measurements at tens of hertz (10 Hz in the simulations below), while the DVL and depth channels operate at a few hertz. Nominal gyro and accelerometer biases are in the ranges 1–10 °/h and 50–500 µg, and the DVL delivers bottom-lock velocity with a typical accuracy of 0.2–0.4% of distance traveled, which is consistent with commercially available marine navigation sensors.
The MDS is modeled as a maneuverable mobile docking station built around a square 4 m × 4 m × 4 m frame that supports a funnel-type docking port aligned with the AUV’s longitudinal axis. The docking port has an entrance diameter of about 1.5 m and a length of 3 m, i.e., a capture aperture roughly 1.5–2 times the AUV diameter, similar to reported docking-station designs for 21-inch-class AUVs [
49,
50]. The frame can be mounted on a low-speed carrier or equipped with small corner wheels for repositioning, but in the estimator we simply treat the MDS as a low-speed rigid body as in
Section 3.3. The MDS houses a USBL-like acoustic positioning head and a digital acoustic modem operating in the 20–30 kHz band. Typical working ranges are 1–3 km with data rates of a few kilobits per second; one-way propagation delay is about 0.7 s per kilometer, and additional MAC/queuing and processing delays lead to effective measurement delays of several seconds [
50]. In the simulations we model the acoustic delay as a bounded random variable τ ∈ [0, τ_max], with τ_max between 5 and 10 s depending on the experiment (e.g., τ_max = 5 s in the delay-distribution study in
Section 4). These values are not tuned to any specific vehicle; they are chosen to be representative of large survey-class AUVs and their docking hardware and to provide realistic scales for sensor rates, noise levels, and acoustic delay. The representative specifications of the AUV, the MDS, and the onboard/acoustic sensors used in the simulations are summarized in
Table 1.
In a typical mission profile, docking is therefore organized in two regimes. Outside a few tens of meters, only inertial and delayed acoustic information are reliably available, and the estimator described in this paper is responsible for keeping the AUV–MDS relative state within a few meters in this “acoustic homing” corridor [
18,
50]. Once the vehicle has entered a small capture zone around the docking port and local water clarity permits, short-range optical aids (e.g., cameras and coded visual markers on the funnel) can be used to sharpen the final alignment [
50]. The present work focuses on the former, delay-affected acoustic–inertial regime and provides delay-robust, statistically consistent state estimates to hand over to such optical guidance in the last meters.
3.2. Frames and Transformations
Let W be the global inertial frame (approximately a level-plane inertial frame), A the AUV body frame, and B the MDS body frame. Let
and
denote the direction-cosine (rotation) matrices of A and B with respect to W (equivalently unit quaternions
,
). Positions and velocities in W are
and
. Body-to-inertial vector transforms satisfy
If needed, fixed sensor extrinsics (e.g., IMU/DVL/transducer lever arms and orientations) are represented by rigid transforms . We treat them as calibrated constants or include them as slowly drifting states when required.
3.3. Kinematic Models
For practicality and estimator consistency we adopt a “simplified 6-DoF rigid body and random-walk bias” continuous–discrete model.
- (1)
AUV dynamics (continuous time)
where
is gravity,
,
are IMU accelerometer/gyroscope readings,
,
are biases,
are IMU noises, and
is the quaternion kinematic operator. Biases follow random walks:
,
. The discrete form is
- (2)
MDS dynamics
The MDS executes low-speed maneuvers with a smoother model:
where
is an equivalent thrust/actuation input (from velocity commands or a simplified maneuvering proxy) and
is process noise. The discrete form is
We use different state vectors at two filtering levels:
- (1)
DC-EKF (AUV-centric): .
- (2)
Co-Aug-EKF (collaborative): , where optionally includes small inter-frame alignment errors, clock offset, or slowly drifting extrinsics to enable closed-loop consistent estimation in a unified frame.
3.4. Measurement Models (With Delay)
- (1)
AUV proprioceptive measurements
IMU outputs drive state propagation; DVL and depth are used for updates. A typical discrete model is
where
. Noises
are zero-mean Gaussian; their covariances may vary slowly with conditions (e.g., bottom-lock quality), which we reflect later via time-varying
.
- (2)
MDS relative observations of the AUV
Using a USBL-like range/bearing model, let
be the relative vector in frame B. Then
An equivalent “range + unit-bearing vector” model can also be used. We only require that the observation function be continuously differentiable for first-order linearization.
- (3)
Delay and OOS model
Let a measurement be generated at time
and received at time
; the delay is
. We assume
is a nonnegative random variable with bounded support [0,
] whose statistics can be estimated online; in simulations we consider several bounded models (uniform, truncated normal, exponential-like, and a short–long mixture) to assess sensitivity to the delay distribution (see
Section 4). OOS means measurements with
may arrive in reverse order. For implementation, each measurement carries its timestamp and quality indicators (e.g., confidence, SNR proxy). A ring buffer over
covers
. AUV proprioceptive measurements are treated as local near-real-time (small delays may be neglected or explicitly modeled).
3.5. Mathematical Problem Statement
Given a prior , input sequences , and two classes of measurements:
(i) near-real-time AUV measurements ;
(ii) MDS relative measurements
with physical timestamps
that may be delayed or OOS at receive time k, the goal at each k is to compute the posterior estimate
with covariance
for either
(DC-EKF) or
(Co-Aug-EKF) by minimizing the conditional negative log-likelihood (MAP):
where
and
is the state at the physical time
. Because
contains delayed/OOS data, substituting “current time” for
is inappropriate. We therefore implement a two-level mechanism:
DC-EKF: Use the ring buffer to perform a consistency-preserving measurement update at , then forward-replay the corrected with cached inputs/noise to time k;
Co-Aug-EKF: In a unified frame, absorb bidirectional observations and relative kinematics with an augmented state to achieve closed-loop consistent estimation.
Additionally, we introduce a delay threshold and a dynamic confidence weight . Measurements with are softly down-weighted or discarded; for , we adapt (or the equivalent information matrix) online according to innovation Mahalanobis distance/NIS and proxies for range/occlusion and link quality, so that fusion weights track information content.
3.6. Noise and Uncertainty Assumptions
Process noises and measurement noise are modeled as zero-mean Gaussian white sequences; IMU biases follow random walks. The process and measurement covariances are allowed to be time-varying (heteroscedastic). For acoustic observations, typically depends on range, occlusion, and sea state; this variability is later encoded through the dynamic confidence weighting used in the DC-EKF and Co-Aug-EKF layers. Consistency control relies on Mahalanobis/chi-square gating and NIS/NEES monitoring, together with Joseph-form covariance updates and occasional spectral projection to maintain positive definiteness and numerical stability.
Although these assumptions follow the standard EKF practice and enable tractable consistency analysis, actual underwater acoustic channels can exhibit heavy tails, occasional outliers, and slowly drifting biases [
10]. To assess the impact of such departures from the Gaussian hypothesis,
Section 4 includes a dedicated robustness study in which the DC-EKF is run under Laplace, Gaussian-mixture, and biased-mixture acoustic noise models while still assuming Gaussian measurement noise in the station channel. The results show that moderate non-Gaussian behavior mainly causes a small, bounded inflation relative to the delay-free baseline rather than catastrophic degradation. Collecting the noise and uncertainty assumptions in this subsection provides a common basis for the algorithmic designs and consistency checks developed in
Section 4,
Section 5 and
Section 6, and avoids repeating them when describing each filtering layer.
3.7. Summary of System Model
We have specified a common frame convention, simplified rigid-body dynamics for both platforms, the AUV/MDS observation relations, and an acoustic-link delay/OOS model. The estimation problem is cast as online MAP with timestamped measurements, highlighting two engineering constraints: (i) late measurements must be absorbed at their physical time to preserve consistency; and (ii) measurement quality is time-varying (heteroscedastic).
Figure 1 summarizes how these elements are assembled into a single processing chain. Starting from onboard and MDS measurements, delayed/OOS acoustic observations are time-rectified via ring-buffer-based backward–forward replay, admitted according to a learned delay threshold, and fused with dynamic confidence weighting before entering the cooperative augmented EKF together with relative motion constraints. The next sections build on this pipeline in detail:
Section 4 presents the DC-EKF,
Section 5 elaborates delay threshold calibration and adaptive confidence weighting, and
Section 6 introduces the Co-Aug-EKF for unified-frame, closed-loop estimation of both platforms’ states.
4. Delay-Compensated EKF (DC-EKF)
Building on the system model, this chapter presents the real-time DC-EKF filtering framework for delayed and out-of-sequence (OOS) acoustic measurements, corresponding to the lower-level delay-compensation block in
Figure 1. The key idea is to maintain a ring buffer that covers the maximum admissible delay, apply each late measurement at its original physical time to obtain a consistency-preserving update, and then forward-replay the corrected state and covariance to the current time. By avoiding the misuse of “past information” on the “current state,” the filter prevents time misalignment and estimator jumps, and it naturally accommodates link jitter and OOS arrivals.
We consider a discrete-time nonlinear system
with
. A nominal EKF proceeds at each step with predict–update:
where
and
. This flow implicitly assumes “measurement time = use time.” When an external observation generated at physical time
is received at step
with delay
(with
the discrete index associated with the timestamp), a direct update at time
yields innovation statistics and covariance evolution that are inconsistent, typically producing error spikes, NIS overruns, and even numerical instability. Therefore, a late measurement must be absorbed at
[
11,
47,
48].
DC-EKF implements this with a ring buffer that explicitly carries a short history. The buffer spans the most recent
steps (set by the maximum admissible delay and the filter rate) and stores, for each step,
together with the information needed to recompute propagation from
(inputs, recomputable Jacobians, and process-noise terms). Upon receiving a timestamped external measurement, the filter locates its generation time
in the buffer. If
lies within the window, it performs a consistency-preserving update at
on
(Joseph-form covariance with Mahalanobis gating to suppress outliers), yielding
. It then re-applies the cached propagation operators in chronological order from
to
, refreshing
along the way. If multiple late measurements fall within the same interval, they are processed in timestamp order to preserve causality. Extremely stale measurements falling outside the window are ignored to avoid disproportionate cost. The mechanism requires no prior assumption on delay statistics and is naturally robust to jitter and OOS [
14,
32,
38].
From a cost and memory standpoint, let be the state dimension. One late measurement incurs one update plus re-propagations, for a cost of approximately . The average overhead grows roughly linearly with the buffer length. In typical AUV–MDS missions, only a fraction of external observations trigger backward–forward replay; overall real-time performance remains tractable. Implementation stores and recomputable terms, without retaining full transition matrices, striking a practical balance between memory and numerical stability.
To assess the effect of DC-EKF, we simulated cooperative docking with both AUV proprioceptive measurements and MDS relative observations. External measurement delays vary randomly within a prescribed range and may arrive OOS. Baselines include: an ideal zero-delay “Standard EKF” that fuses onboard IMU and station measurements assuming instantaneous, delay-free arrival; a “no-compensation” scheme that applies delayed external observations at the current step; a “fixed-delay” scheme that assumes a known lag and aligns accordingly; and the proposed DC-EKF. Metrics include position RMSE, time evolution of the estimation error, and average per-step computation time; we also recorded statistics on how late measurements are handled. Experiment 1 uses a moderate delay range and a fixed update rate to illustrate typical behavior, followed by two sensitivity studies: one varying the maximum delay (
Figure 2) and another examining the effect of different delay distributions while keeping τ_max fixed.
Figure 2 compares RMSE across delay regimes. As the delay bound increases, both “no-compensation” and “fixed-delay” exhibit monotonic error growth, reflecting the inherent risk of treating late data as current under strong jitter. DC-EKF maintains low error for low-to-moderate delays; as delays become large, errors rise but remain clearly below the uncompensated schemes. The trend indicates that as long as late observations fall within the buffer and can be absorbed consistently, backward–forward replay converts them into effective global constraints. As the delay bound increases beyond the buffer window, the marginal utility of additional delayed measurements diminishes; DC-EKF then degrades gracefully toward the onboard-only dead-reckoning level, while still avoiding the sharp jump behavior seen in uncompensated methods.
Figure 3 shows representative time histories of the estimation error. Uncompensated methods display clustered spikes when bursts of late observations arrive, whereas DC-EKF tracks the delay-free Standard-EKF baseline closely and is markedly smoother. Immediately after late data are received, the backward update and forward replay take effect and the error decays rapidly over the next few steps, evidencing the restoration of temporal consistency. This time-domain view corroborates the aggregate statistics in
Figure 2.
Figure 4 summarizes overall RMSE for the four methods in the same scenario. Relative to uncompensated schemes, DC-EKF markedly reduces error amplification caused by time misalignment; relative to the delay-free Standard-EKF baseline, it retains comparable error levels under moderate delays while avoiding estimator jumps. Runtime counters indicate that under moderate delays only a small portion of external observations trigger replay, with most being absorbed “in place.” As the maximum delay grows, the trigger ratio increases; a few extreme-delay or gated-out cases are rejected, yet the overall error profile remains stable.
Figure 5 reports average per-step computation time. Compared with schemes that do not replay, DC-EKF increases runtime only modestly, with growth roughly linear in buffer length, matching the complexity analysis. Thus, introducing a ring buffer and backward–forward replay does not impose prohibitive overhead for online deployment; window length and step rate can be tuned to preserve margin.
To further examine how strongly the estimator depends on the delay statistics themselves, we conducted an additional sensitivity study in which the maximum delay was fixed while the shape of the delay distribution was varied. Specifically, with τ_max = 5 s we considered four bounded models for the acoustic delay τ: a uniform law, a truncated normal law, an exponential-like law favoring small delays, and a bimodal mixture of short and long delays. All other simulation settings (trajectory, noise levels, sensor rates) were kept identical to those in Experiment 1, and for each case thirty Monte Carlo runs were performed with a fixed random seed to ensure reproducibility.
Figure 6 reports the resulting position RMSE for all four fusion strategies. The naïve schemes (“No Delay Comp” and “Fixed Delay”) degrade to roughly 19–24 m RMSE and exhibit noticeable variation across delay models, confirming their sensitivity to how late data are injected as if current. By contrast, the proposed DC-EKF maintains a very similar error level around 1.3 m for all four distributions: the average RMSE is 1.31 ± 0.04 m (uniform), 1.32 ± 0.05 m (truncated normal), 1.27 ± 0.05 m (exponential-like), and 1.32 ± 0.04 m (mixture), corresponding to less than about 4% relative variation across models. At the same time DC-EKF remains close to the delay-free Standard-EKF baseline and more than one order of magnitude better than the naïve delayed-fusion schemes. These results indicate that, as long as the delay is bounded by the buffer horizon, performance depends mainly on the delay bound rather than on the exact delay distribution, and the uniform-delay model used in other experiments serves primarily as a convenient bounded stress test rather than a critical assumption.
To further examine how strongly the proposed estimator depends on the Gaussian-noise assumption in
Section 3.6, we conducted an additional robustness study in which the acoustic measurement noise was deliberately made non-Gaussian while the filter still assumed a zero-mean Gaussian model for design and tuning. All other settings (trajectory, sensor rates, delay range τ ∈ [0, 5] s) were kept identical to Experiment 1. Four noise models were considered for the station/MDS channel:
- (i)
Gaussian: The baseline zero-mean Gaussian noise with covariance .
- (ii)
Laplace (heavy-tailed): A zero-mean Laplace distribution with a scale parameter chosen so that its variance matches that of the Gaussian case, producing heavier tails but the same nominal second moment.
- (iii)
Gaussian mixture (outliers): A mixture model in which, with probability 0.95, measurements are drawn from and, with probability 0.05, from , mimicking occasional large outliers.
- (iv)
Biased mixture (slow drift + outliers): A slowly time-varying bias, modeled as a low-bandwidth random walk, is added to the Gaussian-mixture noise in (iii), emulating calibration errors or environmental shifts that introduce systematic offsets in addition to outliers.
For each noise model, twenty Monte Carlo runs over a 100 s docking segment were performed and the position RMSE was computed for the four fusion strategies.
Figure 7 summarizes the results. As expected, the delay-free Standard EKF, which fuses onboard IMU and station measurements under an ideal zero-delay assumption, delivers the lowest RMSE (≈0.82–0.85 m) and serves as a best-case reference. The naïve delayed-fusion schemes (No-Delay-Comp and Fixed-Delay) remain in the ≈20–24 m range for all four noise models with standard deviations of 2–4 m, confirming that their dominant failure mode is temporal inconsistency rather than the exact noise distribution. By contrast, the proposed DC-EKF stays much closer to the Standard-EKF bound and is remarkably insensitive to the noise model: its RMSE lies between 1.31 m and 1.33 m across all four cases, with less than about 5% relative variation even in the biased-mixture scenario. These results indicate that, under bounded delays and with the proposed gating and confidence-weighting mechanisms, overall performance is governed primarily by how delayed measurements are time-rectified, while moderate deviations from the Gaussian noise assumption mainly cause a small, bounded inflation relative to the ideal zero-delay baseline [
20,
49].
For reproducibility, Algorithm 1 summarizes the ring-buffer update and backward–forward replay: a delayed measurement is first located in the buffer, then applied at its physical timestamp using a Joseph-form update and NEES/NIS gating, and the corrected state/covariance are replayed forward to the current time using cached inputs.
| Algorithm 1: Ring-buffer-based DC-EKF update for delayed/OOS measurements |
Inputs: 1. 2. New measurement z with physical timestamp tm and covariance R 3. Sampling interval Δt, maximum admissible delay τ_max, χ2-gating threshold Outputs: 1. |
- 1.
Compute the delay τ = t_k − tm - 2.
if τ > τ_max then - 3.
- 4.
end if - 5.
- 6.
from B and form innovation - 7.
- 8.
then - 9.
- 10.
end if - 11.
- 12.
Joseph-form update at time tj - 13.
- 14.
- 15.
- 16.
for i = j + 1, …, k do // forward replay - 17.
according to the process model in Section 3 using the cached input u i−1 - 18.
in B - 19.
end for - 20.
|
In summary, DC-EKF updates at the measurement’s physical time and replays forward along the timeline, systematically repairing the consistency violations caused by delayed and OOS acoustic observations. It standardizes the absorption of late information and suppresses estimator jumps. Under moderate delays the method decisively outperforms uncompensated baselines without sacrificing real-time performance; under larger delays it still maintains a clear advantage and yields smoother error trajectories that better support subsequent pose and trajectory alignment. Compared with a delay-free baseline that simply assumes instantaneous access to all measurements (Standard EKF), DC-EKF explicitly handles late information from the MDS and turns it into usable constraints under realistic delayed-link conditions, providing more consistent state inputs at terminal docking. Together,
Figure 2,
Figure 3,
Figure 4,
Figure 5,
Figure 6 and
Figure 7 and the runtime statistics form a coherent body of evidence: the figures capture delay-dependent trends, the effect of different delay distributions, and time-domain behavior, while the counters show that replay triggers and computational load remain within engineering limits. The next chapter introduces adaptive weighting and threshold calibration to further improve robustness and information efficiency under fluctuating link quality.
7. Conclusions
In ice-covered and otherwise challenging acoustic environments with strong latency, jitter, and intermittent availability, we addressed online state estimation for AUV–mobile docking station (MDS) cooperative docking and validated an engineering-ready solution that balances temporal consistency with closed-loop consistency. The framework comprises: (i) a delay-compensated EKF (DC-EKF) built on a ring buffer with backward–forward replay; (ii) a quality-adaptive layer with learned delay thresholds and dynamic confidence weighting; and (iii) a cooperative augmented EKF (Co-Aug-EKF) that jointly estimates both platforms in a unified reference frame. The layers have clear roles and interfaces: the DC-EKF standardizes the absorption of delayed/OOS observations and preserves consistency; the adaptive layer decides admission and influence; and the Co-Aug-EKF achieves cross-platform closed-loop estimation and suppresses relative drift. Simulations show that, without sacrificing real-time performance, the framework significantly improves absolute and relative accuracy, estimator stability, and tolerance to adverse link conditions.
First, the DC-EKF resolves the structural question of how to inject late information without breaking consistency. By updating at the measurement’s physical time and replaying forward, it avoids time misalignment and estimator jumps that arise when stale data are applied to the current state; Joseph-form updates and spectral projection maintain positive definiteness and numerical stability. Under typical moderate delays, experiments show markedly smoother error histories with far fewer spikes. As the maximum delay grows, performance remains superior to the uncompensated and fixed-lag baselines and gradually converges toward the onboard-only dead-reckoning level, exhibiting graceful degradation and operational controllability.
Second, the proposed combination of delay threshold calibration and adaptive confidence weighting bridges information quality and fusion weight in an interpretable, online-learned manner. Using effective information gain, NEES filtering, and percentile updates, the learned threshold converges around 6.35 s; during online operation, 1067 station measurements were admitted and 30 rejected, indicating a gate that is neither overly strict nor permissive. Dynamic weights, driven by innovation statistics, delay, and SNR proxies, shift over time from IMU-dominant to station-dominant and stabilize near IMU ≈ 0.16 and Station ≈ 0.84. Implemented as adaptive measurement covariances, this modulation reduces NIS/NEES exceedances and suppresses spike amplitudes, keeping the error history smooth and controllable despite link quality fluctuations. The full adaptive run yields an overall position RMSE of 5.751 m over the 100 s docking segment, demonstrating the joint benefit of thresholding, replay, and weighting. In the context of the large survey-class AUV and 1.5 m-diameter funnel described in
Section 3.1, this accuracy is adequate to keep the vehicle within the acoustic homing corridor and to hand over to the sub-meter relative estimates of the Co-Aug-EKF—and, in practice, to short-range optical docking aids in the final meters when water clarity permits.
Third, the Co-Aug-EKF jointly models AUV and MDS states in a unified frame and fuses bidirectional relative observations together with soft relative-kinematics constraints in a single Bayesian update. By explicitly maintaining the cross-covariance block , information from either platform is passively propagated to the other, avoiding the “correlation forgetting” inherent to independent filters. In our comparison, AUV position RMSE drops from 0.395 m to 0.310 m (21.5%), MDS from 0.228 m to 0.205 m (10.1%), and relative-position RMSE—most directly tied to docking—drops from 0.347 m to 0.280 m (≈19%). Boxplots show left-shifted medians and interquartile ranges with fewer outliers, indicating gains in both mean-square performance and robustness. Trajectory overlays confirm markedly improved relative consistency near the docking corridor, yielding cleaner state inputs for terminal attitude alignment and control convergence.
In terms of computational cost, the marginal overhead of delay compensation arises from one historical update plus a limited number of forward replays and grows approximately linearly with the buffer length. The adaptive layer uses batched percentile updates and constant-time recursions, so its cost is well below that of replay. In the proposed single-AUV–single-MDS configuration, the augmented state collects the pose, velocity, and inertial biases of both platforms together with a small number of soft-alignment states (extrinsic and clock offsets), leading to a state dimension on the order of a few tens (e.g., ). The Co-Aug-EKF maintains a dense () covariance matrix, so both memory and the dominant matrix operations scale as (O(n2)); with such a modest (n), the resulting cost is negligible compared with the acoustic sampling interval and remains well within real-time limits for typical embedded CPUs. This quadratic behavior is not specific to the proposed method; it arises whenever a centralized EKF maintains a full covariance matrix over an augmented multi-robot state with all pairwise cross-covariances. In a centralized multi-AUV extension with (N) vehicles and one (or several) stations, the augmented state dimension grows approximately as (O(N)), while the covariance matrix becomes block dense with (O(N2)) cross-covariance blocks; both memory and the dominant covariance update operations therefore scale quadratically with (N). For small teams, this cost remains acceptable for real-time execution. For larger fleets, one would combine the proposed framework with sparse or clustered covariance structures—keeping only local cross-covariances and relying on distributed or information-form fusion for long-range coupling—to keep computation and communication bounded. Overall, under realistic low-bandwidth, high-latency constraints, the framework achieves a practical balance among real-time operation, accuracy, and consistency, and its scaling behavior is well characterized for multi-agent extensions.
The benefits are modulated by geometry and link conditions. With sparse bidirectional observations, severe geometric degeneracy, or overly weak relative-constraint covariances, the advantage diminishes; conversely, with adequate observation frequency, geometry improved via FIM-guided path design, and effective thresholding/down-weighting of poor-quality data, the cross-covariance “passive gain” becomes more pronounced. A dedicated sensitivity study (
Section 4) varying the delay distribution (uniform, truncated normal, exponential-like, and a short–long mixture) under a fixed
shows that the DC-EKF RMSE changes by less than about 4%, indicating that performance depends mainly on the delay bound rather than on the precise delay statistics. In addition, a robustness experiment (
Section 4) subjects the DC-EKF to heavy-tailed Laplace, Gaussian-mixture, and biased-mixture acoustic noise while the filter still assumes Gaussian measurement noise; in all cases the DC-EKF remains close to the delay-free Standard-EKF baseline (≈1.3 m vs. 0.82–0.85 m) and its RMSE varies by less than about 5%, whereas naïve delayed-fusion schemes stay in the ≈20–24 m range. Thus, moderate deviations from the Gaussian-noise hypothesis mainly cause a modest, bounded inflation relative to the idealized baseline. Nevertheless, strongly bursty, nonstationary, or extremely heavy-tailed disturbances, together with aggressive maneuvers and significant model mismatch, can still erode consistency and motivate future work on more robust statistics (e.g., Student’s-t or Huber-type updates) or invariant-error formulations.
For deployment, the method exposes clear interfaces and portability. The DC-EKF serves as a front-end for time rectification and consistency-preserving absorption of heterogeneous external observations (USBL, SBL, acoustic bearing, etc.). The adaptive threshold and weights have few hyperparameters and converge online. The Co-Aug-EKF serves as a unified-frame back-end that couples loosely, via information matrices, to upstream outputs, simplifying incremental integration into existing navigation and control stacks. This layered design provides standardized interfaces for future integration with path planning, observability-aware geometry optimization, mission management, and docking control.
Despite these strengths, the present work has several practical limitations. First, all evaluations are simulation-based under numerically realistic but still simplified models; pool and sea trials are needed to verify robustness under real acoustic channels, vehicle–environment interactions, and implementation imperfections. Second, the study focuses on a single-AUV–single-station configuration; although the layered architecture and block-structured covariance naturally extend to small teams, large fleets would face stricter constraints from the quadratic growth of cross-covariance blocks and from communication budgets. Third, we assume bounded delays and approximately Gaussian noises with first-order linearization; strongly bursty, nonstationary, or heavy-tailed disturbances, as well as aggressive maneuvers and model mismatch, may erode consistency and would require more robust statistics or invariant-filtering variants. These limitations delineate the regime in which the current framework is most applicable and motivate the extensions outlined below.
Promising directions include: (1) robust statistics for heavy-tailed/multimodal noise and Student’s t/mixture residual models, or invariant filtering on Lie groups to improve linearization stability; (2) extension to networked multi-AUV/multi-station localization, where the centralized Co-Aug-EKF would be generalized to larger fleets and combined with sparse or partially decentralized updates to manage the (O(N2)) cross-covariance structure while balancing information consistency against communication and computation budgets; (3) information-guided active sensing and geometry optimization that couple observation selection with platform motion under a communication budget; and (4) sea trials closer to engineering conditions to evaluate threshold convergence, weight migration, and closed-loop performance across sea states, occlusions, and geometries, and to tightly couple estimation with docking control for an end-to-end observe–estimate–control loop.
In summary, by combining temporally consistent absorption in the DC-EKF, quality-matched thresholds and weights, and unified cooperative estimation in the Co-Aug-EKF, we deliver reliable state awareness for AUV–MDS cooperative docking under stringent acoustic constraints. The framework bridges theory and practice: it adheres to clear probabilistic and statistical-consistency principles while remaining simple and deployable in computation and implementation. Although the present study focuses on a single-AUV–single-station configuration, the same layered architecture and block-structured covariance are directly compatible with small multi-AUV fleets, and the accompanying scalability discussion clarifies how computational load and cross-covariance management grow with the number of cooperating vehicles. As tasks and environments grow more demanding, this renders the proposed approach a foundational capability that can be extended and integrated into higher-level planners, offering a reusable pathway to high-precision, robust docking and cooperative operations in the ocean.