1. Introduction
Autonomous Underwater Vehicles (AUVs) have become indispensable for deep-sea exploration, long-term monitoring, and subsea infrastructure maintenance. Reliable underwater docking is essential for persistent AUV operation, enabling energy replenishment, high-bandwidth data exchange, and extended seabed residency [
1,
2]. Most existing docking frameworks adopt a hybrid layered guidance architecture: acoustic systems provide robust long-range localization in large-scale underwater environments, while optical sensing delivers the high-precision perception required for terminal alignment during short-range docking. Beyond this hybrid framework, recent advances in long-range navigation have introduced dynamic-model-based integrated navigation approaches, which improve state-estimation robustness under complex motion and environmental disturbances [
3]. At the docking-control level, emerging studies have explored flow-field-based compliance docking to enhance passive interaction and hydrodynamic adaptability during final contact [
4]. Since acoustic guidance techniques have matured and offer stable long-range performance, they are not the focus of this study. Instead, this work concentrates on short-range optical docking, where high-accuracy, low-latency perception is the key enabler for terminal docking [
5].
The performance of optical docking is jointly influenced by beacon configuration and detection robustness. Previous studies have attempted to extend detection range through enhanced optical power, optimized wavelength, and structured beacon geometries [
6,
7]. Traditional threshold- or feature-based detection methods show limited adaptability under varying turbidity and illumination. Deep learning-based detectors improve robustness in such conditions, but challenges remain in scale variation, intermittent visibility, and computational load, which hinder real-time deployment on embedded AUV platforms [
8,
9].
Representative studies highlight these limitations. Li et al. employed a single high-power beacon, which increased docking range but provided insufficient geometric constraints for precise pose estimation and was highly sensitive to occlusion and camera viewpoint [
10]. Zhao et al. proposed a dual-type marker fusion method for underwater visual localization. They identified a key limitation of single visual markers: small ones are undetectable at long distances, while large ones exceed camera field-of-view (FOV) at close range, causing distance-dependent tracking discontinuity [
11]. Yan et al. introduced asymmetric L-shaped arrays to alleviate detection loss, but these remained sensitive to pitch and roll variations and provided limited 3D pose information [
12]. Furthermore, deep learning–based detection is prone to frame-to-frame intermittency (“flicker”), caused both by physical factors, such as light scattering and illumination fluctuations [
13], and algorithmic instability under low contrast and multi-scale distortions [
14]. Such flicker severely undermines continuous pose estimation and short-range docking stability.
In contrast, this work aims to enhance detection continuity and robustness for optical docking by jointly addressing beacon design, multi-distance perception, flicker suppression, and real-time deployability. The main contributions are summarized as follows:
A multi-dimensional underwater optical docking system based on a conical docking station architecture is developed to support stage-based optical guidance. A spatial modeling tool is further introduced to evaluate identifiable regions and assist in beacon configuration analysis.
An adaptive optical guidance algorithm is designed to improve beacon detection reliability in dynamic underwater conditions through dynamic prediction and validation mechanisms rather than parameter tuning. By utilizing spatiotemporal tracking and short-term prediction–correction, detection stability is enhanced without introducing additional sensing modalities.
A lightweight beacon detection model is developed to reduce parameter scale and computation load while retaining acceptable accuracy, enabling real-time deployment on AUV hardware.
The remainder of this paper is organized as follows:
Section 2 reviews related optical docking research;
Section 3 describes the beacon system and guidance space model;
Section 4 presents the adaptive spatiotemporal algorithm;
Section 5 introduces lightweight model optimization;
Section 6 validates the framework through pool and lake experiments;
Section 7 concludes with limitations and future work.
2. Related Works
Recent optical docking studies for AUVs can be broadly divided into two complementary directions: improvements in beacon hardware and advances in beacon detection algorithms.
Table 1 summarizes the major developments in these areas, highlighting representative designs and their reported performance across different water environments.
Optical docking hardware has progressed from single-light beacons to multi-source arrays and, more recently, hybrid optical–EM or optical–passive configurations. Single high-power lamps offer extended visibility but provide limited geometric information for accurate pose estimation. Multi-light arrays improve near-range observability and 3D pose accuracy, yet their effective range is constrained by attenuation, scale reduction, and camera FOV limitations. Consequently, existing beacon designs often exhibit either long-range detectability or short-range precision, depending on configuration characteristics. Hybrid systems extend usable distance through sensing redundancy, but at the cost of additional hardware and more complex perception algorithms.
From the perception perspective, early optical docking relied on threshold segmentation or connected-component methods, which are sensitive to turbidity fluctuations and illumination changes. More recently, deep learning approaches, particularly YOLO, have been increasingly adopted for underwater beacon detection, significantly improving robustness under variable underwater conditions. However, computational load and model size remain important considerations for embedded AUV platforms. Moreover, intermittent visibility and frame-level detection flicker caused by scattering, low contrast, and scale variation are still common challenges, and many existing approaches focus primarily on improving per-frame detection rather than maintaining continuity across the entire guidance process.
Overall, prior work has advanced beacon design, perception robustness, and multi-sensor fusion, but these components are often addressed independently according to specific system requirements. This motivates further research on short-range optical docking methods that emphasize detection continuity, robustness to visibility fluctuations, and real-time feasibility on embedded hardware—guiding the design of the system and algorithms presented in this work. It should be noted that the performance metrics reported in different studies are not directly comparable, due to variations in water conditions, evaluation criteria, and experimental setups;
Table 1 therefore summarizes only publicly available data without implying strict quantitative comparison.
3. Design of Multidimensional Underwater Docking System
3.1. Underwater Docking Scheme
The conical docking station consists of an AUV receiving chamber and a conical docking port, integrating acoustic beacons, optical beacons, environmental sensors, and communication modules. However, during terminal docking, short-range perception is often insufficient and optical guidance becomes unstable under complex water conditions, resulting in degraded docking accuracy. To enhance the continuity of optical guidance throughout the final approach, this study proposes a multi-dimensional optical docking strategy, as illustrated in
Figure 1. Thes overall process is divided into three collaborative stages as follows:
After the AUV approaches the station via acoustic navigation, the onboard camera continuously searches for optical beacons. Once a stable optical signal is detected, the system switches from acoustic to optical guidance. Long-distance beacons provide azimuth information, and the AUV performs course correction using a Line-of-Sight (LOS) scheme to gradually reduce heading deviation.
As the AUV continuously approaches the docking station, upon identifying more than three valid optical beacons, it enters the middle-distance guidance zone. At this stage, the optical guidance system outputs the precise pose information of the AUV relative to the docking station, constructs the AUV’s accurate underwater spatial state, and enables high-precision docking.
When the AUV reaches a distance of 2–3 m from the docking station, the system switches to the close-range guidance scheme. At this stage, the AUV identifies the red guidance light at the docking station entrance as the guidance indicator, completing the final step of optical guidance docking by LOS.
3.2. Stereo Optical Beacon Design
The new stereo optical beacon system is shown in
Figure 2. Unlike traditional single-mode optical beacon systems, this design employs a combination of multiple active light modes, including a high-power blue LED for long-distance guidance, a blue LED array for medium-distance guidance, and red LEDs for short-distance docking. The specific parameters of these three beacon types are provided in
Table 2. By integrating these three types of active light, the system subdivides the underwater optical guidance space into long-, medium-, and short-distance stages, enabling a structured and progressive approach to precise docking.
To verify the usability of the stereo optical beacon system, experiments were conducted in a pool with dimensions of 80 m × 15 m × 30 m. The water in the pool was relatively clear, and the water attenuation coefficient was 0.40 m
−1. Using the YOLO-based optical detection model adopted in this study, we tested the detection of the three types of beacons at distances of 2 m, 10 m, and 35 m, respectively. The results show that all three types of beacons meet the detection conditions within the designed ranges, and the recognition results are shown in
Figure 3.
3.3. Underwater Optical Guidance Space Model with Directional Propagation
The middle-distance guidance phase is critical for the AUV’s precise docking. Here, the AUV calculates its relative pose to the docking station via the Perspective-n-Point (PNP) algorithm, where the algorithm’s accuracy directly dictates the guidance’s final performance. Per the requirements of PNP calculation, at least four sets of corresponding 2D–3D coordinates must be identified to enable accurate pose calculation. Based on this, this study constructs an underwater optical guidance space model for simulation investigations. This model uses the docking station’s coordinate system shown in
Figure 4 as the reference coordinate system, with all subsequent coordinate systems in this study based on it.
3.3.1. Light Source Model
Radiant flux is a fundamental physical quantity characterizing the total radiant energy emitted by a light source per unit time across all directions in space. To simplify the underwater optical beacon light source model and focus on key physical mechanisms of light propagation, this study adopts an ideal light source assumption: the total radiant flux of the optical beacon, denoted by
I, is determined solely by the light source’s inherent properties. Its value is jointly constrained by the beacon light source’s rated electrical power and electro-optical conversion efficiency, representing the basic radiant energy output level. In underwater optical guidance scenarios, the optical beacon’s beam does not diverge omnidirectionally but is emitted directionally as a cone. Therefore, solid angle normalization should be applied to the total radiant flux [
25]. For a beam with a half-angle divergence of
, per the solid angle formula for a spherical cap, the solid angle
of the conical beam satisfies:
Based on this, the effective initial light intensity
I0 is defined as the normalization of total radiant flux with respect to solid angle, representing the radiant flux per unit solid angle. Its expression is:
The physical meaning of this definition is that if the total radiant flux I remains constant and the light beam is confined to a smaller solid angle through optical designs such as a condenser lens, then the radiant flux per unit solid angle I0 will increase accordingly. This concentrates the total radiant energy within a narrower angular range, thereby accurately capturing the initial intensity characteristics of the light beam during directional propagation.
3.3.2. Underwater Light Propagation Attenuation Model
Table 3 summarizes the core parameters of underwater light propagation, covering key physical quantities such as light source radiation characteristics, water optical parameters, and turbulence characteristics, providing basic inputs for light propagation modeling [
26]. Among them, parameters such as the scattering coefficient and absorption coefficient of underwater light rays have different values in different underwater environments; the value of the asymmetry factor g depends on the type of particles in the water [
27]; and the turbulent disturbance depends on the disturbance of the water flow, which can be measured and calculated using an Acoustic Doppler Current Profiler (ADCP).
This study comprehensively considers the energy attenuation due to absorption and scattering, as well as the asymmetry of the scattering direction, to simulate the propagation of light underwater. The absorption of light by water molecules and suspended particles directly consumes the energy of light. When light is scattered by particles, not all scattered light can contribute to the light intensity at the target point. Only part of the scattered light remains on the effective propagation path. The directional preference of this scattering is described by the scattering asymmetric factor
g. When
, the scattering is mainly forward; when
, the scattering is mainly backward; when
, the scattering is isotropic [
28]. Combined with the simplified Lambert-Beer model, introduce the particle attenuation factor
:
Underwater turbulence is manifested as random spatial-temporal fluctuations in the refractive index of water, which can cause random deflections in the direction of beam propagation. For a directionally emitted beam, its main direction
(describing the tilt characteristic of the beam along the z-axis) and the planar distribution angle
(describing the azimuthal direction of the beam in the XOY plane) will undergo random modulation due to the dynamic interference of turbulence. To this end, our model introduces a Gaussian random variable
with a mean of 0 and a standard deviation of
to simulate beam jitter:
The conically directional emitted light beam is subject to particle scattering, turbulent disturbance, and other factors. Theoretically, weak light remains detectable in regions far beyond the beam’s half-divergence angle. To simplify the model and highlight the core characteristic of directional propagation, the angular attenuation of the beam is treated in segments. Let
denote the angle between the vector from the docking station’s origin to any point in space and the central axis of the beam after turbulent disturbance; the angular attenuation law of the beam is described by the following piecewise function. The standard deviation
of the Gaussian distribution is a key parameter characterizing the statistical scale of the beam’s angular diffusion, and its value is determined by comprehensively considering the beam’s half-divergence angle and the angular disturbance induced by particle scattering:
The divergence of the light beam with the propagation distance
d will lead to the attenuation of the light intensity per unit area. In the underwater optical guidance scenario, the light beam is emitted directionally in a conical shape. Its attenuation law needs to comprehensively consider the coupled effects of factors such as the propagation distance and the total underwater attenuation coefficient. In the model, a reference distance
d0 and an attenuation exponent
n are introduced to construct a distance attenuation factor
:
In summary, when light travels from the light source to the spatial point
, the light intensity
at point
, satisfies the following equation:
Through the establishment of the above physical model, we define that the precise guidance space
is the region in the space that satisfies the following conditions:
Herein, denotes the minimum observable light intensity, is an indicator function (1 if the condition is satisfied, 0 otherwise), and represents the illumination area of the lamp.
3.3.3. Simulation of Precise Guidance Space
The constructed underwater precise guidance spatial model integrates light source radiation characteristics and underwater light propagation laws, facilitating quantitative assessment of the docking system’s coverage and reliability.
Table 4 summarizes the size of the underwater precise guidance space formed by optical beacons under different numbers of beacons and beam angle settings. The simulation was conducted using a conical docking station with a 2 m diameter, with results indicating that installing more active beacons yields a larger underwater precise guidance space, though this accordingly increases the docking station’s energy consumption. Additionally, the optical beacon’s beam angle influences the overall size of the guidance space. Under the current docking station design, the maximum precise guidance space is achieved when the beacon beam angle is within 40–60°. Moreover, this study also found that offsetting the irradiation direction of the beacon light inward by a specific angle can effectively expand the precise guidance space.
Table 5 shows the simulated guidance space of the proposed six-beacon docking station (45° beam divergence) under different included angles between the beacon irradiation direction and the positive z-axis. Compared with vertical installation, this setup achieves a 25% expansion of the guidance space.
In summary, the multi-dimensional docking system effectively expands the available guidance space and addresses short-range perception limitations. However, the ultimate success of docking still relies on stable beacon detection and accurate pose estimation in the dynamic underwater environment. To this end, the next section introduces an adaptive spatiotemporal guidance algorithm.
4. Spatiotemporally Adaptive Association Algorithm for AUV Underwater Optical Guidance
Many flickering artifacts appear when applying the YOLO-based optical beacon detection algorithm, which severely impacts the stability of underwater optical guidance. To overcome this limitation, this study develops a new AUV optical guidance framework (
Figure 5). The framework is specifically designed to improve robustness under complex underwater conditions, where factors such as light scattering, occlusion, and noise frequently disrupt visual detection. It consists of three tightly coupled modules: the Beacon Kalman Tracking–Prediction Module, the Spatiotemporal Beacon Authentication Module, and the PNP Pose Estimation Module with Dynamic Solver Selection [
29,
30]. Together, these modules form a closed-loop process that adaptively filters, verifies, and revalidates beacon detections in real time. In this context, “adaptive” denotes mechanism-level adaptivity, meaning that the algorithm dynamically adjusts prediction, authentication, and solver selection processes according to beacon visibility and detection confidence, rather than relying on parameter self-tuning.
4.1. Beacon Kalman Tracking-Prediction Module
Beacon detection in underwater environments is highly sensitive to noise, occlusion, and fluctuations in illumination. To maintain reliable tracking under these disturbances, the proposed system employs a Kalman filter for joint estimation of beacon position and velocity. By combining a prior motion model with real-time observations, the Kalman filter smooths noisy trajectories and predicts beacon positions when detections are temporarily unavailable [
31].
The filter models the state transition and measurement processes as:
where
denotes the beacon’s pixel-space position and velocity,
is the observed image coordinate, and
are zero-mean Gaussian process and observation noise [
32,
33]. For the specific context of underwater docking, a constant-velocity motion model was adopted, as the relative motion between the AUV and the beacons in the image plane is gradual and smooth under low-speed operation. The process noise covariance
was determined empirically based on frame-to-frame beacon jitter observed in pool experiments, ensuring that the filter remained responsive to short-term occlusions without excessive smoothing. The measurement noise covariance
was derived from the detection confidence of the YOLO-based beacon model, assigning lower uncertainty to high-confidence detections. These parameters were iteratively validated through on-water experiments to ensure stable prediction behavior while avoiding divergence over long prediction intervals [
34]. In each frame, the filter predicts the next position and velocity; when a detection is available, the predicted state is updated using the Kalman gain, otherwise only prediction is applied. A maximum prediction window of ten consecutive frames (approximately 0.34 s at 30 fps) is employed to prevent excessive drift during prolonged occlusions. This value was chosen based on empirical observations that short-term beacon flickers are common during docking, whereas long-duration full occlusions rarely occur in lake or pool environments. This design provides stable trajectories for beacon authentication, suppresses noise through multi-frame screening, and supplies predicted positions to the PnP solver during occlusions.
4.2. Spatiotemporal Beacon Authentication Module
To achieve accurate mapping between detected targets and predefined beacon IDs, the system designs a dynamic authentication mechanism that integrates temporal stability validation (via multi-frame occurrence count and average confidence) and spatial layout constraints. Notably, to further enhance matching reliability and reduce ambiguity, candidate beacons are sorted in descending order of their average detection confidence during the authentication process—detection points with higher confidence are granted priority in geometric matching and beacon ID assignment. The core workflow of this mechanism is detailed in Algorithm 1:
| Algorithm 1: Dynamic Beacon Authentication Algorithm |
Input: Detection result over a time window : each entry is (track_id: , keypoint coordinates: , confidence: ); thresholds: , , . Output: Authentication map M: track_id → beacon_id
- 1:
//1. Multi-frame candidate filtering: Filter out unreliable tracks by verifying occurrence frequency and average confidence across multiple frames. - 2:
Candidates ← Ø - 3:
for each detection result in : - 4:
← Number of times appears in - 5:
← Average confidence of across - 6:
If and : - 7:
Add to Candidates - 8:
//2. Geometric symmetry verification: Validate candidate pairs via symmetry relative to the coordinates of all candidates. - 9:
← Centroid of keypoint coordinates of Candidates - 10:
Valid ← Ø - 11:
Matched ← Ø - 12:
For each in Candidates: - 13:
if is in Matched: Continue - 14:
← ; //Symmetric point - 15:
For each in Candidates: - 16:
If not in Matched and distance between and ≤ : - 17:
Add and to Valid - 18:
Add and to Matched - 19:
Break loop over - 20:
//3. Assign beacon IDs via geometric mapping - 21:
Sort Valid by angle and left–right geometric ordering - 22:
Assign B0–B5 to sorted points based on predefined layout - 23:
M ← Map track_ids in Valid to their assigned beacon IDs - 24:
Return M
|
The core advantages of this mechanism are filtering instantaneous false positives through multi-frame statistics (step 1) to ensure temporal stability of candidates; eliminating spatial anomalies via geometric symmetry verification (step 2) to strengthen consistency with predefined layouts; and achieving precise binding of detections to physical beacons through regularized ID allocation (step 3).
4.3. PNP Pose Estimation Module with Dynamic Solver Selection
The final step is to estimate the pose of the AUV using the established beacon correspondences. This is formulated as a Perspective-n-Point (PNP) problem, where the objective is to determine the camera pose from known 3D world coordinates of beacons and their corresponding 2D image projections. To ensure both accuracy and computational efficiency under varying numbers of valid beacons, the system adopts a two-level dynamic optimization strategy.
- (1)
Dynamic Algorithm Selection Based on Valid Beacon Count: The PNP algorithm is dynamically selected according to the number of valid beacons n. When 3 ≤ n ≤ 4, the iterative PNP is employed, which minimizes the reprojection error via Levenberg–Marquardt optimization. When 5 ≤ n ≤ 6, the fast PNP is used, accelerating the solution process based on quadratic programming [
35].
- (2)
Reprojection Error Verification: After pose calculation, the result’s reliability is assessed using the reprojection error, defined as
where
is the perspective projection function,
is the rotation matrix (obtained from a rotation vector via Rodrigues’ formula), and
represents the 3D world coordinate of the beacons. If the error exceeds a predefined threshold, the result is deemed unreliable. In such cases, the system clears the corresponding beacon authentication data, re-executes the spatiotemporal authentication procedure (
Section 4.2), and repeats pose estimation with revalidated correspondences.
This mechanism integrates dynamic solver selection and error-based revalidation to ensure efficient and accurate pose calculations, even when beacon detections are incomplete or corrupted by noise, thereby significantly enhancing the reliability of AUV guidance in complex underwater environments.
5. Lightweight Optical Beacon Detection Model Based on YOLO
Underwater optical beacon detection faces challenges such as illumination variability, background interference, and scale changes. Traditional methods, including global thresholding and connected-component analysis, lack robustness in such environments and perform poorly on small targets, limiting their applicability [
20]. Recent advances in deep learning have significantly improved detection performance under complex underwater conditions, with YOLO-based one-stage detectors achieving a strong balance between efficiency and accuracy. Considering the inherent constraints of the computational resources of AUVs, and the adaptive guidance strategy designed to improve the stability of underwater guidance algorithms significantly increases the computational complexity, which puts forward the dual technical requirements of lightweight and real-time performance for the optical beacon detection algorithm. Therefore, this section conducts a lightweight optimization design for the YOLOv8-pose model [
36].
5.1. Underwater Optical Beacon Dataset
To support research needs for underwater guided vision tasks, this study constructs an active underwater guidance beacon dataset for such tasks. The dataset comprises underwater images of active optical beacons, primarily derived from 35 docking videos captured via high-resolution specialized underwater camera systems during underwater optical guidance experiments. Frame extraction at 15-frame intervals avoids information redundancy in successive frames by ensuring sufficient inter-frame time gaps. Some eight-light images were sourced from the UDID dataset [
37]. Experimental sites include the South China Sea, Qiandao Lake, and large-scale experimental pools, encompassing diverse aquatic environmental characteristics. In terms of beacon features, the dataset includes optical beacons of varying power levels and covers spectral characteristics across multiple color bands in the visible spectrum, thus fully simulating the diverse color properties that optical beacons may exhibit in underwater environments.
The dataset was labeled using LabelMe 5.8.3 and randomly split into a training set and a validation set, containing 6730 training images and 1290 validation images, respectively. In addition, to further enhance the dataset’s diversity and quality while mitigating overfitting risks in small-sample scenarios, a multi-dimensional data augmentation strategy was applied to the collected underwater optical beacon images, including geometric transformations and pixel adjustments. This dataset covers typical underwater lighting degradation scenarios such as high contrast, low definition, severe glare, and multi-light interaction, comprehensively depicting the imaging characteristics of optical beacons in complex underwater environments.
Figure 6 randomly presents various environments of underwater optical beacons from this dataset.
5.2. Lightweight Model Architecture
YOLOv8n-pose, a YOLO-series model for keypoint detection, outputs optical center keypoint coordinates after extensive training on the dataset, rendering it highly suitable for positioning in underwater optical beacon detection. Its architecture consists of three components: the backbone extracts multi-scale features from underwater images via deep convolution and pooling, gradually capturing beacon texture and contour features layer by layer; the neck fuses multi-scale features and enhances contextual information through the Feature Pyramid Network (FPN) and Path Aggregation Network (PAN); the head outputs beacon detection results and keypoint positions to finalize positioning and feature extraction. After lightweight optimization, the overall model architecture is presented in
Figure 7. Key optimizations aimed at reducing redundancy and enhancing computational efficiency include replacing conventional convolution with Ghost-Conv to construct a lightweight backbone; reconstructing the C2f module in the backbone into a CSPPC structure via Partial Convolution (PConv); and replacing the original C2f module with C3k2 in the neck. The specific implementation mechanisms and performance impacts of these improvements are elaborated in the following sections.
- (1)
GhostConv Module:
GhostConv differs from standard convolution in its operational mechanism: standard convolution directly generates all feature maps by traversing all input channels with multiple convolution kernels, while GhostConv first generates core feature maps containing key information through a small number of convolutions, then produces complementary phantom feature maps via low-cost linear transformations. This design significantly reduces the number of parameters and computational complexity while retaining feature representational power [
38]. In the specific scenario of underwater optical beacon detection, due to the significant differences between optical beacons and their surrounding environment, standard convolution tends to over-extract features from both beacon regions and background regions during feature extraction, resulting in a large number of redundant feature maps—a phenomenon that is particularly prevalent in underwater optical beacon images, with specific examples shown in
Figure 8. These redundant feature maps mostly consist of repeated representations of beacon or background features, which hold no practical value for beacon detection but instead add unnecessary computational overhead. In contrast, GhostConv can significantly reduce such redundancy while retaining the core features of the beacon, making it highly suitable for underwater optical beacon detection tasks.
- (2)
CSPPC Module
The C2f module, a key component in YOLO, is designed to implement Cross Stage Partial Fusion. It concatenates outputs from different Bottleneck modules with the original feature map to better aggregate multi-scale information. However, the C2f module’s deep network structure and large convolution kernels lead to high computational costs in terms of Floating Point Operations (FLOPs) and memory access overhead. To address this, this work introduces Partial Convolution (PConv) to optimize the C2f module, developing a new Bottleneck architecture termed CSPPC [
39]. The module structure diagram is shown in
Figure 9. Mainly, the standard convolution in the Bottleneck of the C2f module is replaced by PConv.
Figure 10 shows the operational logic of PConv. For the input feature map X, it is assumed that the input is
and the convolutional kernel
. PConv selects a subset of
channels from its channels and extracts spatial features from the subset of the execution standard convolution. The unselected channels keep the original features unchanged and finally splice with the convolutional feature map in the channel dimension to still form the output feature map of the
.
If using standard convolution, the calculated number of FLOPs can be calculated using the following formula:
With PConv, when is 1/4 of , FLOPs are reduced to 1/16 of standard convolution.
- (3)
C3k2 Module
Finally, we introduce C3k2 to replace the C2f module in the Neck stage (
Figure 11). The C3k2 module is YOLO11’s optimized version of the traditional CSP Bottleneck structure. The core goal is to improve the feature extraction efficiency through parallel convolution design and flexible parameter configuration. Compared with the C2f module, C3k2 introduces a parallel convolutional layer to replace a single convolutional, which reduces redundant computation and improves the inference speed [
40]. The divide-and-conquer extraction logic of parallel convolution can capture the highlighted core and contour details of the beacon, effectively alleviate the interference of underwater scattering and stray light, and improve the detection rate and positioning accuracy of the beacon in blurred or low-light scenarios. At the same time, the channel segmentation and flexible parameter design of C3k2 can reduce the computational overhead under the constraints of AUV embedded computing power, and at the same time ensure that the inference speed meets the real-time requirements of guidance.
6. Experiments and Results
6.1. Lightweight Optical Beacon Detection Model Experiment
The model was trained using the experimental configuration outlined in
Table 6, with key settings including a batch size of 16, a unified input size of 640 × 640 following preprocessing, and 200 training epochs. The optimizer was set to Adam with an initial learning rate of 0.001 and no momentum term. To enhance training efficiency, mixed-precision training was employed, and a fixed random seed was used to ensure experimental reproducibility. Data augmentation strategies included horizontal flipping and HSV color jittering, designed to account for underwater light variations and beacon attitude changes.
The single-class underwater optical beacon detection model is evaluated using bounding box and keypoint average precision (AP), model parameters, GFLOPs, and inference speed (FPS). AP50(B) measures bounding box localization at IoU 0.5, while AP50(P) and AP50–95(P) assess keypoint prediction accuracy, with AP50–95(P) providing a more stringent evaluation across IoU thresholds from 0.5 to 0.95; model parameters and GFLOPs indicate structural and computational efficiency, and FPS reflects real-time feasibility on AUV platform.
6.1.1. Comparative Experiment
Table 7 presents the evaluation results of various methods on the test set. Deep learning-based optical beacon detection models exhibit a significant edge over traditional object detection methods in terms of FPS. Our proposed lightweight beacon detection model achieves a 15.2% FPS improvement compared to the baseline and outperforms all competing models, reaching the highest FPS of 76. In terms of accuracy, the baseline model achieves an AP50(B) of 91.1% and an AP50(P) of 97.5%. Our lightweight model yields an AP50(B) of 90.9% and an AP50(P) of 97.4%, with only a minimal difference from the baseline. Additionally, this study compares our model with others such as YOLOv11n-pose, YOLOv12n-pose, and YOLO8-redter. Experimental results indicate that these models perform worse than ours in terms of both accuracy and real-time performance.
6.1.2. Ablation Experiment
The ablation experiments aim to clarify the functions of various modules and structures in deep neural network models. To evaluate the effects of the various modules proposed in the study, we conducted ablation experiments on the underwater optical beacon dataset, and the experimental results are shown in
Table 8. As can be seen from the table, the C3k2 module and the CSPPC module made the main contributions to the lightweighting of the model. When using C3k2 alone to replace the C2f structure in the model’s neck, the number of parameters decreased by 19.4% and the Flops decreased by 20.2%. Using the CSPPC module alone can reduce the number of parameters by 16.1% and the Flops by 15.5%. When these two modules are introduced simultaneously, although the model achieves a significant reduction in the number of parameters and computational volume, the AP(50) decreases by more than one point. Therefore, we introduced many GhostConv in the model to replace the ordinary convolution Conv in the original model, forming a lightweight backbone network for underwater optical beacon detection. The experimental results prove that this network structure has good adaptability to the environment of underwater optical beacons with a monotonous and high-contrast environment. Whether it is combined with the C3k2 module or the CSPPC module, this network structure can achieve both model lightweighting and ensure model accuracy.
Finally, our new underwater optical detection model incorporates modules such as GhostConv, CSPPC, and C3k2. With only a 0.2% reduction in the model’s average precision, the total parameters are compressed to 1.8 M, a 41.9% reduction, the computational cost is reduced to 4.8 GFlops, a 42.9% reduction, and the FPS increases to 76, a 15.2% increase. The results indicate that under the synergistic effect of multiple modules, our lightweight optical beacon detection model breaks through the lightweight limit of a single module. While optimizing both parameters and computational cost, it ensures detection performance and inference speed, verifying the superimposed gain effect of the improvement strategy on model lightweighting.
6.2. Underwater Optical Docking Experiment
This experiment aims to verify the real-time performance and stability of the proposed new optical guidance algorithm during the AUV’s “precise guidance phase” and systematically confirm the performance improvement of the proposed underwater guidance algorithm compared with traditional algorithms.
6.2.1. Experimental Setup
The experiment was conducted in a clear, chlorophyll-rich lake to evaluate the proposed optical guidance algorithm under realistic underwater conditions. A medium-sized commercial AUV was used for validation. All motion was controlled by the vehicle’s onboard navigation and control system. The AUV adopts the standard docking-alignment architecture commonly used in commercial platforms, where heading correction is handled by a built-in LOS-based controller and depth keeping relies on an internal PID stabilizer. These modules are not modified in this study and operate independently of the proposed visual-guidance method. Although the internal control laws are proprietary, modern commercial AUVs are equipped with mature low-level navigation and control modules that autonomously handle heading regulation, depth keeping, surge thrust management, and attitude stabilization. These capabilities are standard across off-the-shelf AUV platforms and are not implemented or modified in this study. Consistent with terminal docking practices reported in the literature, such systems generally employ heading-based lateral correction together with a depth-holding or touchdown-alignment controller, while maintaining an approximately constant forward speed during the final approach [
41]. In this work, our algorithm interfaces with the AUV at the perception level only, providing relative pose measurements at 30 Hz:
Here, the lateral offset and heading angle are the primary quantities used by the AUV’s navigation module to steer toward the docking line, while the depth component serves as a reference input for the vehicle’s native depth-keeping controller. Since coarse acoustic alignment already places the AUV and docking station approximately on the same horizontal plane, the built-in control system performs all remaining adjustments to depth, forward thrust, and attitude stabilization independently of our algorithm. No modification of the internal control architecture is performed in this work. During docking, the AUV operated at a low speed of approximately 0.5 m/s to ensure stable visual perception and reliable pose estimation. The docking station employed a six-lamp array arranged in a near-circular configuration to provide full coverage across different distances. The AUV carried an HD Multi SeaCam manufactured by DeepSea Power & Light (San Diego, CA, USA), mounted at the bow for real-time image capture (1080p, 30 fps). Visual data were processed onboard using a Jetson Orin module running the proposed algorithm.
To minimize environmental variability and ensure a fair comparison of algorithms, video data from each docking trial were extracted for offline testing on a Jetson AGX Orin platform. All comparative methods, except the proposed algorithm, relied on RANSAC and Hungarian matching for beacon tracking and used the standard PnP algorithm for pose estimation. Four methods were evaluated, differing primarily in their beacon detection logic:
- (a)
Threshold: Simple global binarization.
- (b)
Blob: Regional morphological analysis.
- (c)
YOLO: Baseline YOLO model.
- (d)
Our Method: The proposed algorithm.
6.2.2. Experimental Results and Analysis
Figure 12 illustrates the operational states of four docking algorithms applied to the same video sequence. Frame 001 corresponds to the earliest entry into the precise guidance space among the four methods. As shown, the proposed algorithm enters this space first and successfully detects all six optical beacons. At this stage, neither the traditional Blob-based nor Threshold-based methods detect any beacon information. Although the original YOLO algorithm can detect beacons, its performance is unstable, as evidenced in Frames 80 and 160: while five beacons are detected in both frames, the lower-left beacon is missing in Frame 80, and the upper-right beacon is undetected in Frame 160, reflecting intermittent recognition and occasional misidentifications. This instability prevents the original YOLO algorithm from accurately matching beacons and performing PnP pose estimation, limiting it to coarse guidance using the LOS algorithm. In contrast, the proposed algorithm maintains stable tracking of all optical beacons throughout the process and extends the spatial range of precise guidance, effectively addressing the core challenge targeted in this study.
Figure 13 presents the position trajectories derived from three underwater docking experiments. These trajectories represent the AUV’s vision-based pose estimates, not absolute ground-truth positions. Each image frame is processed to calculate the AUV’s pose, and temporal interpolation is applied to smooth the discrete pose estimates, forming the continuous trajectories shown in the figure. While the trajectories exhibit near-linear characteristics, this behavior is expected due to the low-speed (0.5 m/s) control strategy. The PID-based heading controller adjusts only small, gradual changes to maintain stability, resulting in a relatively smooth, near-linear trajectory pattern. This is a natural outcome of the system design, where the AUV is expected to maintain minimal deviation in lateral and heading angles during terminal docking. Additionally, the inward-tapered docking port geometry provides tolerance for residual lateral offsets. As a result, even with moderate noise in frame-level pose estimates, the AUV can achieve successful docking if its macroscopic trajectory remains within the acceptable capture corridor.
Figure 14 illustrates the corrections made by the algorithm in the face of interference during the docking process. The optimized algorithm does not arbitrarily introduce interference signals that occasionally appear in a few frames; instead, it predicts unstable signals through Kalman filtering, ensuring that the overall guidance process does not deviate due to occasional invalid frames. Our algorithm improves the problem of unstable target detection in the visual guidance process, thereby enabling the AUV to perform stable and precise guidance.
Table 9 presents statistics on the starting points where the AUV enters the precise guidance space and the total range of the precise guidance space in the experiment. For ease of comparison, we use the distance along the Z-axis direction of the docking station coordinate system to characterize the range of the precise guidance space. The data shows that the original YOLO algorithm enters the precise guidance space at 6.9 m from the docking station, while our new algorithm enters this phase at 12.0 m from the docking station, representing 74% increase in the entry distance. In terms of the total range of the precise guidance space, our algorithm can calculate the precise position and attitude of the AUV within a 10.3 m range, which is a 104% increase compared to the total range of the original YOLO algorithm. Meanwhile, the same docking video was tested on the Jetson Orin platform to measure end-to-end processing latency. The proposed method requires approximately 28 ms per frame from image capture to final pose estimation, which satisfies the real-time processing requirement of 30 frames per second for underwater cameras. These results demonstrate that the proposed algorithm achieves significant improvements in both guidance range and real-time computational efficiency.
Finally, to analyze the stability characteristics of the algorithm, frame-by-frame analysis was carried out on the docking data in the “precise guidance phase” of the three underwater docking experiments. The total number of frames after entering the precise guidance phase and the number of invalid frames in which the precise guidance data was lost were recorded. The ratio of the invalid frames to the total number of frames in the precise guidance was defined as the distortion rate, which was used to quantitatively evaluate the stability of the algorithm. The specific results are shown in
Table 10 (a–d, corresponding to the three guidance strategies described in
Section 6.2.1). The results show that in the process of precise guidance, our algorithm has the lowest distortion rate, only 1.76%, which indicates that the stability of the algorithm has been significantly improved after spatiotemporal constraints.
7. Conclusions
To meet the precise docking requirements of AUVs, this study addresses the instability of traditional underwater optical guidance from three perspectives: system design, algorithmic robustness, and model lightweighting. First, a multi-dimensional optical docking system was developed to ensure continuity of optical guidance in conventional beacon configurations. By combining stereo beacons with a calibrated guidance-space model, the system can achieve continuous optical navigation across approximately 0–35 m in pool experiments. Second, to address beacon misdetections and temporal instability in complex underwater light fields, an adaptive optical guidance algorithm combining spatiotemporal feature fusion and Kalman-based trajectory prediction was proposed. This approach maintains beacon-tracking continuity by predicting motion trends and verifying candidate targets through dynamic spatiotemporal authentication. Lake experiments demonstrated that the algorithm extends the effective precise-guidance distance by about 74% and effectively suppresses intermittent flicker phenomena observed in original YOLO methods. Finally, a lightweight YOLO-based detection model was implemented to improve onboard real-time performance. The optimized network reduces both parameters and FLOPs by approximately 42% while preserving detection accuracy, achieving 76 FPS on standard embedded hardware, sufficient for processing 30 FPS video streams in real-time AUV docking applications.
Despite these advantages, several limitations remain. The method assumes relatively stable illumination and moderate water clarity, conditions that may not be met in highly turbid or dynamic environments. Differences observed between pool and lake experiments further emphasize the sensitivity of optical guidance to water quality and light scattering. In addition, while individual modules have been validated independently, a fully integrated end-to-end docking prototype has not yet been realized, and the quantitative contribution of the proposed Spatiotemporally Adaptive Association Algorithm has not been analyzed. Future work will focus on completing system-level integration to form a unified docking prototype, enabling comprehensive assessment of interactions among guidance, perception, and control modules. Adaptive parameter tuning and lightweight sensitivity analysis will be applied to the optical propagation model and perception pipeline to improve their adaptability across diverse water conditions. Meanwhile, the beacon simulation tool will be extended to incorporate variations in illumination and turbidity, supporting more robust model generalization. These enhancements aim to increase robustness across diverse underwater conditions and strengthen the generalization capability of the lightweight detection model. Collectively, these efforts aim to enhance spatial coverage, environmental robustness, and operational reliability, ultimately supporting practical deployment in real-world underwater docking scenarios.
Author Contributions
Conceptualization, K.S., W.Z. and Y.L.; methodology, K.S. and W.Z.; software, W.Z.; validation, W.Z. and K.S.; formal analysis, W.Z.; investigation, K.S., W.Z. and Y.L.; resources, K.S.; data curation, W.Z. and Y.L.; writing—original draft preparation, W.Z.; writing—review and editing, K.S. and W.Z.; visualization, W.Z.; supervision, K.S.; project administration, K.S.; funding acquisition, K.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
The code supporting this study are publicly available at
https://github.com/Judyrobot/Shenian-12yue, accessed on 9 December 2025. Additional experimental data are available from the corresponding author upon reasonable request.
Acknowledgments
The authors gratefully acknowledge the support extended by the State Key Laboratory of Robotics and Intelligent Systems, Shenyang, in encouraging this research work.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Curtin, T.B.; Bellingham, J.G.; Catipovic, J.; Webb, D. Autonomous Oceanographic Sampling Networks. Oceanography 1993, 6, 86–94. [Google Scholar] [CrossRef]
- Page, B.R.; Mahmoudian, N. Simulation-Driven Optimization of Underwater Docking Station Design. IEEE J. Ocean. Eng. 2020, 45, 404–413. [Google Scholar] [CrossRef]
- Zan, L.; Ji, D.; Pang, S.; Chan, S.; Bai, J. Compliance docking based on flow field hydrodynamics for underwater self-reconfigurable robots. Ocean Eng. 2026, 343, 123615. [Google Scholar] [CrossRef]
- Ji, D.; Cheng, H.; Zhou, S.; Li, S. Dynamic model based integrated navigation for a small and low cost autonomous surface/underwater vehicle. Ocean Eng. 2023, 276, 114091. [Google Scholar] [CrossRef]
- Maki, T.; Shiroku, R.; Sato, Y.; Matsuda, T.; Sakamaki, T.; Ura, T. Docking Method for Hovering Type AUVs by Acoustic and Visual Positioning. In Proceedings of the 2013 IEEE International Underwater Technology Symposium (UT), Tokyo, Japan, 11–14 March 2013; pp. 1–6. [Google Scholar] [CrossRef]
- Trslic, P.; Rossi, M.; Robinson, L.; O’Donnel, C.W.; Weir, A.; Coleman, J.; Riordan, J.; Omerdic, E.; Dooly, G.; Toal, D. Vision Based Autonomous Docking for Work Class ROVs. Ocean Eng. 2020, 196, 106840. [Google Scholar] [CrossRef]
- Li, D.; Zhang, T.; Yang, C. Terminal Underwater Docking of an Autonomous Underwater Vehicle Using One Camera and One Light. Mar. Technol. Soc. J. 2016, 50, 58–68. [Google Scholar] [CrossRef]
- Kim, B.; Sung, M.; Lee, M.; Cho, H.; Yu, S.C. Imaging Sonar Based AUV Localization and 3D Mapping Using Image Sequences. In Proceedings of the Global Oceans 2020: Singapore—U.S. Gulf Coast, Biloxi, MS, USA, 5–30 October 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Li, Y.; Sun, K.; Han, Z.; Lang, J. Deep Learning-Based Docking Scheme for Autonomous Underwater Vehicles with an Omnidirectional Rotating Optical Beacon. Drones 2024, 8, 697. [Google Scholar] [CrossRef]
- Yahya, M.F.; Arshad, M. Robust Recognition of Targets for Underwater Docking of Autonomous Underwater Vehicle. In Proceedings of the IEEE Autonomous Underwater Vehicle Symposium (AUV), Tokyo, Japan, 6–9 November 2016; pp. 401–407. [Google Scholar] [CrossRef]
- Zhao, C.; Dong, H.; Wang, J.; Qiao, T.; Yu, J.; Ren, J. Dual-Type Marker Fusion-Based Underwater Visual Localization for Autonomous Docking. IEEE Trans. Instrum. Meas. 2024, 73, 8500211. [Google Scholar] [CrossRef]
- Yan, Z.; Gong, P.; Zhang, W.; Li, Z.; Teng, Y. Autonomous Underwater Vehicle Vision Guided Docking Experiments Based on L-Shaped Light Array. IEEE Access 2019, 7, 72567–72576. [Google Scholar] [CrossRef]
- Kirk, J.T.O. Volume Scattering Function, Average Cosines, and the Underwater Light Field. Limnol. Oceanogr. 1991, 36, 455–467. [Google Scholar] [CrossRef]
- Nabahirwa, E.; Song, W.; Zhang, M.; Chen, S. An Empirical Study on the Robustness of YOLO Models for Underwater Object Detection. arXiv 2025. [Google Scholar] [CrossRef]
- Fan, S.; Liu, C.; Li, B.; Xu, Y.; Xu, W. AUV Docking Based on USBL Navigation and Vision Guidance. J. Mar. Sci. Technol. 2019, 24, 673–685. [Google Scholar] [CrossRef]
- Wang, S.M.; Wang, X.W.; Lei, P.S.; Chen, J.A.; Xu, Z.K.; Yang, Y.Q.; Sun, L.; He, J.; Zhou, Y. Blue ROVs Diode Light for Underwater Optical Vision Guidance in AUV Docking. Proc. SPIE 2019, 11182, 111820Z. [Google Scholar] [CrossRef]
- Zhong, L.; Li, D.; Lin, M.; Lin, R.; Yang, C. A Fast Binocular Localisation Method for AUV Docking. Sensors 2019, 19, 1735. [Google Scholar] [CrossRef] [PubMed]
- Ni, T.; Sima, C.; Zhang, W.; Wang, J.; Guo, J.; Zhang, L. Vision-Based Underwater Docking Guidance and Positioning: Enhancing Detection with YOLO-D. J. Mar. Sci. Eng. 2025, 13, 102. [Google Scholar] [CrossRef]
- Ren, R.; Zhang, L.; Liu, L.; Yuan, Y. Two AUVs Guidance Method for Self-Reconfiguration Mission Based on Monocular Vision. IEEE Sens. J. 2021, 21, 10082–10090. [Google Scholar] [CrossRef]
- Alla, D.N.V.; Jyothi, V.B.N.; Venkataraman, H.; Ramadass, G.A. Vision-Based Deep Learning Algorithm for Underwater Object Detection and Tracking. In Proceedings of the OCEANS 2022-Chennai, Chennai, India, 21–24 February 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Lin, M.; Lin, R.; Li, D.; Yang, C. Light Beacon-Aided AUV Electromagnetic Localization for Landing on a Planar Docking Station. IEEE J. Ocean. Eng. 2023, 48, 677–688. [Google Scholar] [CrossRef]
- Kobatake, K.; Okamoto, A.; Sasano, M.; Inaba, S.; Fujiwara, T. Docking Control Method Using LEDs Detection by Hovering AUV “Hobalin” for Deep-Sea Research. In Proceedings of the OCEANS 2024-Halifax, Halifax, NS, Canada, 23–26 September 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Zhang, Z.; Ding, W.; Wu, R.; Lin, M.; Li, D.; Lin, R. Autonomous Underwater Vehicle Cruise Positioning and Docking Guidance Scheme. J. Mar. Sci. Eng. 2024, 12, 1023. [Google Scholar] [CrossRef]
- Vandavasi, B.N.J.; Shakeera, S.; Narayanaswamy, V.; Ramadass Gidugu, A.; Venkataraman, H. EM and Vision-Aided Multisensor Fusion Homing Guidance System (MSF-HGS) for Intelligent AUVs. IEEE Sens. J. 2025, 25, 1912–1926. [Google Scholar] [CrossRef]
- Veach, E. Robust Monte Carlo Methods for Light Transport Simulation. Ph.D. Dissertation, Stanford University, Stanford, CA, USA, 1997. Available online: https://graphics.stanford.edu/papers/veach_thesis/thesis-bw.pdf (accessed on 9 December 2025).
- Wang, M.; Wang, Y.; Cheng, Q. Study on the Transmission and Scattering Characteristics of Blue-Green Lasers Through the Atmosphere-Ocean Interface. In Proceedings of the 2021 13th International Symposium on Antennas, Propagation and EM Theory (ISAPE), Zhuhai, China, 1–4 December 2021; pp. 1–3. [Google Scholar] [CrossRef]
- Sahu, S.K.; Shanmugam, P. A Theoretical Study on the Impact of Particle Scattering on the Channel Characteristics of Underwater Optical Communication System. Opt. Commun. 2018, 408, 3–14. [Google Scholar] [CrossRef]
- Pitarch, J.; Brando, V.E.; Talone, M.; Mazeran, C.; D’Alimonte, D.; Kajiyama, T.; Kwiatkowska, E.; Dessailly, D.; Gossn, J.I. Analytical Modeling and Correction of the Ocean Colour Bidirectional Reflectance Across Water Types. Remote Sens. Environ. 2025, 329, 114920. [Google Scholar] [CrossRef]
- Wang, Z.; Xiang, X.; Guan, X.; Pan, H.; Yang, S.; Chen, H. Deep Learning-Based Robust Positioning Scheme for Imaging Sonar Guided Dynamic Docking of Autonomous Underwater Vehicle. Ocean Eng. 2024, 293, 116704. [Google Scholar] [CrossRef]
- Zhang, B.; Zhong, P.; Yang, F.; Zhou, T.; Shen, L. Fast Underwater Optical Beacon Finding and High Accuracy Visual Ranging Method Based on Deep Learning. Sensors 2022, 22, 7940. [Google Scholar] [CrossRef]
- Ji, D.; Ogbonnaya, S.G.; Hussain, S.; Hussain, A.F.; Ye, Z.; Tang, Y.; Li, S. Three-Dimensional Dynamic Positioning Using a Novel Lyapunov-Based Model Predictive Control for Small Autonomous Surface/Underwater Vehicles. Electronics 2025, 14, 489. [Google Scholar] [CrossRef]
- Zeng, B.; Li, Y.; Gong, X.; Wang, H.; Huang, Z.; Zhou, H. Velocity Trajectory Planning and Tracking Control Based on Basis Function Superposition and Metaheuristic Algorithms for Planar Underactuated Manipulators. Actuators 2025, 14, 505. [Google Scholar] [CrossRef]
- Loebis, D.; Sutton, R.; Chudley, J.; Naeem, W. Adaptive Tuning of a Kalman Filter via Fuzzy Logic for an Intelligent AUV Navigation System. Control Eng. Pract. 2004, 12, 1531–1539. [Google Scholar] [CrossRef]
- Zhang, Y.; Sun, P.; Jiang, Y.; Yu, D.; Weng, F.; Yuan, Z.; Luo, P.; Liu, W.; Wang, X. ByteTrack: Multi-Object Tracking by Associating Every Detection Box. In Proceedings of the Computer Vision—ECCV 2022, Tel Aviv, Israel, 23–27 October 2022; pp. 1–21. [Google Scholar] [CrossRef]
- Li, S.; Xu, C.; Xie, M. A Robust O(n) Solution to the Perspective-n-Point Problem. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1444–1450. [Google Scholar] [CrossRef]
- Maji, D.; Nagori, S.; Mathew, M.; Poddar, D. YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 2636–2645. [Google Scholar] [CrossRef]
- Liu, S.; Ozay, M.; Okatani, T.; Xu, H.; Sun, K.; Lin, Y. Detection and Pose Estimation for Short-Range Vision-Based Underwater Docking. IEEE Access 2019, 7, 2720–2749. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features From Cheap Operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1577–1586. [Google Scholar] [CrossRef]
- Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 12021–12031. [Google Scholar] [CrossRef]
- Khanam, R.; Hussain, M. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024. [Google Scholar] [CrossRef]
- Esteba, J.; Cieślak, P.; Palomeras, N.; Ridao, P. Docking of Non-Holonomic AUVs in Presence of Ocean Currents: A Comparative Survey. IEEE Access 2021, 9, 86607–86631. [Google Scholar] [CrossRef]
Figure 1.
Underwater optical docking scheme.
Figure 1.
Underwater optical docking scheme.
Figure 2.
Stereo optical beacon system.
Figure 2.
Stereo optical beacon system.
Figure 3.
Stereo beacon detection results: (a) long-distance light detected by YOLO; (b) array light detected by YOLO; (c) red light detected by YOLO.
Figure 3.
Stereo beacon detection results: (a) long-distance light detected by YOLO; (b) array light detected by YOLO; (c) red light detected by YOLO.
Figure 4.
Underwater docking station reference coordinate system.
Figure 4.
Underwater docking station reference coordinate system.
Figure 5.
Underwater Adaptive Optical Guidance Algorithm.
Figure 5.
Underwater Adaptive Optical Guidance Algorithm.
Figure 6.
Typical pictures of underwater optical beacons.
Figure 6.
Typical pictures of underwater optical beacons.
Figure 7.
Underwater lightweight optical beacon model.
Figure 7.
Underwater lightweight optical beacon model.
Figure 8.
The feature map output after one convolution.
Figure 8.
The feature map output after one convolution.
Figure 9.
Structure of CSPPC module.
Figure 9.
Structure of CSPPC module.
Figure 10.
Principle of Partial Convolution.
Figure 10.
Principle of Partial Convolution.
Figure 11.
Structure of C3k2 module.
Figure 11.
Structure of C3k2 module.
Figure 12.
Operating status of four underwater optical guidance algorithms. Subfigures (
a–
d) correspond to the four target detection methods listed in
Section 6.2.1.
Figure 12.
Operating status of four underwater optical guidance algorithms. Subfigures (
a–
d) correspond to the four target detection methods listed in
Section 6.2.1.
Figure 13.
Top View of Vision-Based Pose Estimation for Three AUV Dockings: Frame-Wise Pose Scatter Points and Trajectory Curves.
Figure 13.
Top View of Vision-Based Pose Estimation for Three AUV Dockings: Frame-Wise Pose Scatter Points and Trajectory Curves.
Figure 14.
The picture that corrects the interference.
Figure 14.
The picture that corrects the interference.
Table 1.
Evolution of optical docking in recent years.
Table 1.
Evolution of optical docking in recent years.
| Year | Beacon | Detection Method | Range (m) | Environment/ Depth (m) | Precise Pose | Success Rate |
|---|
| 2018 [15] | 1 light. | Tradition | 0–20 | Pool/1 | No | 85.7% |
| 2019 [12] | L-Shaped light Array. | Tradition | 0–10 | Pool/0.5–4.5 | Yes | 100% |
| 2019 [16] | Blue laser. | Tradition | 0–10 | Pool/- | No | - |
| 2019 [17] | Three lights (white). | Tradition | 1.2–3.6 | Pool/0–1.2 | Yes | 100% |
| 2020 [18] | Circular LED Ring. | YOLO | 0–15 | Pool/4 | Yes | - |
| 2020 [6] | Asymmetrical LED Array. | Tradition | 0–8 | North Atlantic/20–25 | Yes | 100% |
| 2021 [19] | Four lights and Aruco Marker. | YOLOv3 | 3–15 | Pool/- | Yes | 100% |
| 2022 [20] | Circular lamp array. | YOLOv4 | 0.5–5.5 | Shallow waters/- | Yes | - |
| 2023 [21] | One Blue-green light and EM beacon. | Tradition | 0–10 | Pool/0.5–1 | Yes | 100% |
| 2024 [11] | 3 Green central lights. 3 blue peripheral lights. ArUco marker. | Tradition | 0–1.1 | Pool/0–1 | Yes | 100% |
| 2024 [22] | 2 linear LEDs and 1 circular LED (red). | Tradition | 0–5 | Pool/4.5 | Yes | 100% |
| 2024 [23] | 50 W lamp (white). | Tradition | 5–20 | Lake/0.6 | No | 80% |
| 2024 [9] | Rotating beacon (Blue). | YOLOv8 | 0–45 | Pool/3 | No | - |
| 2025 [24] | LED strip light (white) and EM beacon. | YOLOv5 | 0–10 | Pool/2 | Yes | 99.6% |
Table 2.
Optical beacon parameters.
Table 3.
Parameters of the optical beacon attenuation model.
Table 3.
Parameters of the optical beacon attenuation model.
| Symbol | Physical Meaning | Unit |
|---|
| Beam half divergence angle | |
| Optical absorption coefficients of water | |
| Optical scattering coefficient of water | |
| Scattering anisotropy factor | - |
| Turbulence intensity | - |
| Distance decay exponent | - |
| Beam propagation distance | |
Table 4.
Simulation results of beacons of different scattering angles and numbers.
Table 4.
Simulation results of beacons of different scattering angles and numbers.
| Angle (°) | Volume (m3) |
|---|
| | 4 Light | 5 Light | 6 Light | 8 Light | 10 Light |
| 5 | 24.1 | 39.3 | 63.2 | 118.2 | 172.4 |
| 10 | 49.8 | 69.7 | 101.6 | 168.6 | 232.9 |
| 20 | 100.3 | 131.8 | 178.3 | 254.3 | 330.3 |
| 30 | 139.5 | 174.9 | 226.0 | 313.3 | 388.5 |
| 40 | 163.8 | 204.8 | 252.7 | 345.3 | 413.5 |
| 50 | 177.4 | 214.9 | 263.1 | 345.0 | 415.9 |
| 60 | 178.7 | 216.6 | 264.8 | 348.9 | 414.5 |
| 70 | 171.1 | 209.5 | 253.2 | 334.4 | 392.3 |
Table 5.
Precise guidance space for different deviation angles.
Table 5.
Precise guidance space for different deviation angles.
| Deviation Angles (°) | 10 | 8 | 6 | 4 | 2 | 0 | −2 | −4 |
| Volume () | 290.5 | 310.7 | 324.8 | 313.4 | 286.8 | 260.9 | 226.5 | 195.9 |
Table 6.
Model training configuration.
Table 6.
Model training configuration.
| Configuration | Specification (Train) | Specification (Inference) |
|---|
| GPU | NVIDIA GeForce RTX 5070 | 12-core ARM Cortex-A78AE |
| CPU | AMD Ryzen 9 8945HX | NVIDIA Ampere, 2048 CUDA cores, 64 Tensor Cores |
| Pytorch | 2.7.1 | 2.3.0 |
| CUDA | 12.6 | 12.2 |
Table 7.
Results of comparative experiments.
Table 7.
Results of comparative experiments.
| Methods | AP50(B) | AP50(P) | AP50-95(B) | Parameters (M) | GFlops | FPS |
|---|
| Tradition 1 | - | - | - | - | - | 50 |
| Yolov8n-pose | 0.911 | 0.975 | 0.974 | 3.1 M | 8.4 | 66 |
| Yolov11n-pose | 0.903 | 0.974 | 0.972 | 2.7 M | 6.7 | 61 |
| Yolov12n-pose | 0.899 | 0.971 | 0.970 | 2.3 M | 6.2 | 64 |
| Yolo8-redter | 0.824 | - | - | 6.1 M | 11.8 | 64 |
| Ours | 0.909 | 0.974 | 0.974 | 1.8 M | 4.8 | 76 |
Table 8.
Results of ablation experiments.
Table 8.
Results of ablation experiments.
| Model | Parameter (M) | GFlops | AP50(B) | AP50(P) | AP50-95(P) | Fps |
|---|
| Yolov8n-pose | 3.1 | 8.4 | 0.911 | 0.975 | 0.974 | 66 |
| A | 2.5 | 6.7 | 0.907 | 0.975 | 0.974 | 70 |
| B | 2.6 | 7.1 | 0.903 | 0.971 | 0.970 | 73 |
| C | 2.8 | 7.8 | 0.910 | 0.976 | 0.975 | 72 |
| A + B | 2.1 | 5.3 | 0.901 | 0.973 | 0.975 | 73 |
| B + C | 2.3 | 6.5 | 0.908 | 0.975 | 0.974 | 70 |
| A + C | 2.2 | 6.0 | 0.910 | 0.976 | 0.975 | 74 |
| A + B + C | 1.8 | 4.8 | 0.909 | 0.974 | 0.974 | 76 |
Table 9.
Application parameters of AUV underwater guidance algorithms.
Table 9.
Application parameters of AUV underwater guidance algorithms.
| Method | Entry Point (m) | Total Space (m) | Processing Time for Each Frame (ms) |
|---|
| (a) | 6.3 | 4.5 | 35 |
| (b) | 7.8 | 5.9 | 43 |
| (c) | 6.9 | 4.9 | 44 |
| (d) | 12.0 | 10.3 | 28 |
Table 10.
Distortion rate of four algorithms.
Table 10.
Distortion rate of four algorithms.
| | a | b | c | d |
|---|
| Invalid frame | 43 | 101 | 499 | 28 |
| Total frames | 838 | 1129 | 1591 | 1591 |
| Distortion rate | 5.13% | 8.95% | 31.36% | 1.76% |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).