Design of an AUV Visual Docking Localization Simulation Platform Based on Webots

Xing, Runfa; Zhang, Lichuan; Han, Guangyao; Liu, Lu

doi:10.3390/jmse14040374

Open AccessArticle

Design of an AUV Visual Docking Localization Simulation Platform Based on Webots

¹

School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an 710072, China

²

Shenzhen Research Institute, Northwestern Polytechnical University, Shenzhen 518057, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(4), 374; https://doi.org/10.3390/jmse14040374

Submission received: 13 January 2026 / Revised: 10 February 2026 / Accepted: 11 February 2026 / Published: 15 February 2026

(This article belongs to the Special Issue Overall Design of Underwater Vehicles)

Download

Browse Figures

Versions Notes

Abstract

To meet the design and evaluation requirements of underwater vision-based docking localization, a Webots-based simulation platform for Autonomous Underwater Vehicle (AUV) visual docking localization was designed and implemented to address the high cost of real sea trials, uncontrollable operating conditions, and the difficulty of systematically covering extreme scenarios. An end-to-end simulation of the docking localization pipeline was provided. Visual components—including fiducial markers, underwater illumination and imaging, and occlusion—were modeled in relatively fine detail, while non-vision-dominant factors such as propulsion and hydrodynamics were treated with approximate models to balance visual realism and simulation efficiency. The platform supported multiple types of visual markers, parameterized configuration of underwater lighting and turbidity, and the generation of diverse occlusion scenarios, enabling unified integration and benchmarking of docking localization algorithms. The results showed that the platform offered tunable scene parameters, repeatable conditions, and broad algorithm compatibility, and it effectively revealed performance differences across algorithms for complex combinations of illumination, turbidity, and occlusion. These capabilities reduced the risk and cost of real underwater docking experiments and supported faster iterative improvement of vision-based localization methods.

Keywords:

AUV; Webots simulation; vision localization; homing and docking

1. Introduction

Autonomous Underwater Vehicles (AUVs) are indispensable platforms for ocean exploration, environmental monitoring, and military operations, and their autonomy and reliability directly influence mission execution efficiency [1]. Autonomous vision-based docking between an AUV and a docking station is a key enabling technology for underwater energy replenishment, data recovery, and long-term resident observation [2,3]. However, conducting docking experiments directly in real ocean environments remains challenging [4,5]. Sea trials are expensive and typically depend on large support platforms and prolonged offshore operations. At the same time, underwater visual perception is substantially degraded by distance-dependent light attenuation, turbidity-induced scattering and contrast loss, and color distortion, which collectively reduce the robustness and stability of marker detection and localization (e.g., beacon lights or structured patterns) in complex waters. Moreover, it is difficult to precisely reproduce and systematically control diverse combinations of turbidity, illumination, and occlusion conditions in physical experiments, limiting comprehensive and repeatable validation as well as robustness evaluation of vision-based docking localization algorithms. Therefore, a low-cost, controllable, and repeatable simulation-based validation environment is urgently needed to systematically evaluate and compare underwater visual docking localization methods and to support algorithm design and iterative optimization.

In recent years, simulation technologies for underwater robots have made notable progress, primarily focusing on motion control, path planning, and sensor modeling. Platforms such as Gazebo, Unity, and Unreal Engine (UE) provide fundamental support for hydrodynamics and multi-sensor integration and have become mainstream choices for simulating ground and aerial robots [6,7,8]. Wen et al. [9] implemented physics-engine simulation of an autonomous underwater glider (Autonomous Underwater Glider, AUG) on the Gazebo platform and realized feedback of the physical model in digital space via WebGL-based 3D visualization; by incorporating an improved artificial potential field method and edge-computing techniques, they achieved formation path-optimization control for multiple AUGs. Wang et al. [10] proposed an underwater robot dynamic simulation platform based on UE, AirSim, and a Distributed Simulation Algorithm Platform (DASP), which provides a safe, efficient, and reproducible virtual test environment for robot control algorithms through water-body modeling, lighting and shading rendering, and physical interaction; however, scene construction relies heavily on manual parameter tuning and substantial computational resources, and models of hydrodynamics and communication still require refinement, such that they cannot yet fully replace real-water testing. Deng et al. [11] developed an underwater robot simulation framework using MATLAB/Simulink 2021B and UE4, enabling rapid deployment and validation of sensor noise models in a highly realistic virtual underwater environment; through noise-injection experiments on stereo cameras and radar, they verified its effectiveness in improving simulation fidelity, but the framework currently supports only Gaussian noise, and its extensibility to additional sensor types and noise models remains limited. Wang et al. [12] presented a co-simulation approach based on Webots [13] and MATLAB/Simulink for modeling and algorithm verification of underwater robot control systems: Webots was used to construct the physical model, Simulink was used to implement controllers for velocity and attitude tracking, and real-time data exchange was achieved via APIs to improve simulation credibility and development efficiency; nevertheless, more complex hydrodynamic or noise models have not yet been incorporated.

Existing studies have largely focused on AUV navigation control or sonar sensing, whereas simulation platforms specifically designed for vision-guided docking and covering the full “perception–localization–docking” pipeline remain scarce. Li et al. [14] developed a virtual-reality-based online simulation system for UUV underwater docking to visualize the docking process and validate algorithms. The system integrated four navigation modalities with fuzzy PID control, offering real-time, interactive, and immersive operation; however, its complex architecture imposed strong requirements on communication synchronization and hardware, and its extensibility still needs improvement. Zhang et al. [15] fused visual, inertial, and pressure-sensor information and constructed a virtual docking simulation system by integrating vehicle dynamics, control algorithms, perception modules, and environmental disturbances; robust optically guided docking was achieved using an unscented Kalman filter. Nevertheless, the framework was limited to single-beacon scenarios and has not yet been comprehensively validated in real waters to assess its generalization capability and long-term stability. Jena et al. [16] built an AUV underwater docking simulation platform based on Unity3D 2020 and MATLAB 2020B, incorporating YOLOv3 for real-time detection of the optical docking target and vision-based guidance; the platform supported six-DOF motion simulation, image acquisition and processing, and TCP/IP communication with MATLAB. However, the simulation environment was often idealized, particularly in the modeling of critical underwater visual factors such as light attenuation, turbidity variations, and target occlusion, which hinders faithful reproduction of complex underwater optical scenes and thereby limits systematic robustness evaluation of vision-based docking localization algorithms.

In summary, existing vision-based docking simulations still suffer from pronounced limitations. On the one hand, underwater optical effects—such as depth-dependent illumination attenuation and turbidity—are often modeled in an overly coarse manner, making it difficult to faithfully capture image degradation and its impact on target recognition under complex water conditions. On the other hand, the morphology and discriminative features of visual docking markers are not simulated with sufficient fidelity, which limits compatibility with diverse vision algorithms and reduces the diversity of test scenarios and performance evaluations [17]. Moreover, although Webots has been widely used as a general-purpose robotic simulation platform that integrates a physics engine and multiple sensors in terrestrial and aerial domains, its capability for underwater visual-scene modeling tailored to AUV vision-based docking remains limited, and a dedicated system-level solution is still lacking. Overall, an integrated simulation of challenging underwater visual conditions—particularly multi-illumination, multi-turbidity, and severe occlusion—remains largely absent, highlighting an urgent need for more realistic simulation tools with finer-grained modeling of vision-related factors [18].

Against this backdrop, for the design and evaluation of vision-based localization methods, developing an AUV docking simulation platform with underwater visual effects at its core has become a key link between algorithm development and sea-trial validation. By integrating (i) underwater illumination and depth-attenuation modeling, (ii) a tunable turbidity-induced imaging degradation model, (iii) mechanisms for generating multiple types of visual markers and complex occlusion scenarios, and (iv) approximate kinematic modeling into Webots, systematic and quantitative simulation-based testing and comparative analysis of vision-based docking localization algorithms can be conducted in low-cost, parameter-controllable, and repeatable settings. This, in turn, can substantially reduce the risks and resource consumption associated with sea trials and accelerate iterative optimization and engineering deployment. The contributions of this paper are as follows:

1.: An underwater docking fiducial system was developed to support multiple classes of vision algorithms. By exploiting hierarchical point–line–plane features, a closed-loop modeling pipeline was established that spans fiducial design, visual detection, and pose estimation, providing a unified visual measurement interface for end-to-end vision-in-the-loop AUV docking simulation.
2.: A parametric underwater optics-and-occlusion model was established by extending the Webots fog model with depth-dependent attenuation, adjustable turbidity, and randomized occlusion, enabling controllable simulation of representative underwater illumination and turbidity conditions.
3.: An integrated docking-process simulation that accounts for both AUV kinematics and camera field-of-view constraints was implemented, along with a batch benchmarking framework for quantitative comparison of localization accuracy, docking success rate, and robustness across multiple trajectories and operating conditions.

The remainder of this paper is organized as follows. Section 2 describes the overall methodology and implementation of a simulation platform for underwater vision-based docking, including AUV motion and camera modeling, a multi-type visual marker suite, underwater illumination and attenuation models, and a scenario generation mechanism that accounts for occlusions and current-induced disturbances. Section 3 presents the platform’s operation and evaluation; visual localization algorithms were quantitatively tested under representative conditions with varying illumination, turbidity, and occlusion, which confirmed the platform’s functional completeness and the accuracy of its results. Section 4 concludes the paper and discusses how the platform supports the design, comparative evaluation, and iterative optimization of underwater docking localization algorithms, as well as directions for future extension.

2. Methodology

In the study of an AUV visual-docking simulation platform, a vision-centric simulation system was constructed to support the design and evaluation of underwater vision-based docking localization methods, as illustrated in Figure 1. The system was developed at two levels: (i) an underwater visual localization platform and (ii) a full-process simulation of the localization procedure. At the platform level, three classes of vision-related factors were modeled for algorithm development and comparative benchmarking. First, the camera model allowed key parameters (e.g., focal length, resolution, field of view, and noise) to be adjusted while incorporating underwater imaging characteristics, thereby producing controllable visual inputs. Second, a visual-marker scheme composed of point-, line-, and plane-based features was built on the docking target to accommodate diverse detection and pose-estimation algorithms. Third, the docking environment was parameterized in terms of underwater light attenuation, turbidity, and occlusion, enabling typical visual scenes with varying illumination, turbidity levels, and occlusion patterns to be reproduced. Rather than emphasizing propulsion or fluid-dynamic details, the platform prioritized realism and tunability of visual effects. At the process-simulation level, the AUV was driven to execute a time-evolving docking task on top of the platform, and the operational pipeline of vision-based localization was sequentially emulated, including target detection, marker identification, pose solving, and error-feedback propagation. This end-to-end modeling covered the full trajectory from the long-range approach to the terminal docking lock, enabling systematic performance evaluation of different vision algorithms across diverse environmental conditions. By separating the “parameterized visual-environment platform” from the “task-level localization workflow,” the proposed approach facilitated systematic investigation of the relationship between environmental conditions and docking performance while maintaining simulation efficiency and experimental repeatability, thereby providing a controllable and low-cost validation tool for the design and optimization of underwater visual docking localization algorithms.

2.1. Underwater Visual System Simulation

2.1.1. AUV

This study aimed to develop a controllable and observable “motion-camera” platform for visual localization. In the AUV simulation model, the camera was configured as the primary payload, while a carrier with basic maneuvering capability was used to impose programmable three-dimensional motion on the vision sensor. Based on the physical AUV’s dimensions, external contours, and symmetric six-thruster layout, a parameterized 3D carrier model was created in Blender and imported into Webots, as illustrated in Figure 2 [17,18]. Within Webots, thruster-driven control was applied to regulate both position and attitude, allowing the camera to follow predefined 3D trajectories and providing a stable, repeatable experimental platform for subsequent visual localization and motion estimation.

On the premise that the geometric model is consistent with the physical system, functional simulation was intended to achieve closed-loop AUV operation across “perception–decision–execution”. To this end, a “component-characteristic matching and interface standardization” strategy was adopted: in Webots, sensor and actuator nodes whose functions most closely resemble the real hardware were selected, and a unified interface was enforced so that simulated measurements and control commands were directly compatible with the physical platform. Motion-related components (e.g., attitude sensors, depth sensors, and thrusters) were matched in a simplified manner by retaining only key parameters—range, accuracy, and update rate (Table 1)—thereby providing the necessary inputs/outputs for attitude and depth control. As a result, the AUV’s motion behavior and pose evolution remained consistent, providing the camera with a reliable motion reference.

2.1.2. Camera and Marker

The camera payload was modeled in an “independent, fine-grained, and configurable” manner. To satisfy the geometric and temporal consistency requirements of vision-based localization, key metrics—including field of view, resolution, frame rate, link latency, and noise—were parameterized, and the camera-to-carrier mounting pose was specified with high fidelity. This ensured that the simulated images closely approximated the physical system in both imaging geometry and timing characteristics, providing credible inputs for algorithm validation. To facilitate engineering implementation and maintain alignment during design reviews, the camera parameters were summarized in a tabulated form, as shown in Table 2, improving readability, inspectability, and traceability while enabling rapid switching among parameter sets across task scenarios for comparative experiments.

Underwater visual docking typically spans a continuous transition from the long-range approach to close-range precision alignment, and the localization algorithm must remain reliable throughout this range. In practice, the underwater imaging process is strongly influenced by light attenuation and scattering, changes in turbidity, and frequent occlusions, which in turn alter the visibility and usable measurements of fiducial markers at different distances. This makes it difficult for any single marker modality to meet end-to-end requirements: long-range operation favors highly detectable, attenuation-robust cues, whereas close-range operation benefits from information-dense patterns that enable more accurate pose constraints. To support a systematic evaluation under diverse conditions, a mixed marker set was designed in the simulation platform following a “point features for long-range guidance + planar features for close-range fine localization” scheme. An actively illuminated point-source array was adopted for guidance and coarse localization at longer ranges, while planar QR-code markers were adopted for close-range precise docking and target identification [19]. These two marker types were selected to complement each other in distance coverage, robustness under varying illumination conditions, and sensitivity to occlusion. By incorporating representative combinations in simulation, the diversity of marker deployments in real underwater docking was approximated while keeping the experimental conditions comprehensive and controllable. Accordingly, in Webots, the point-light and QR-code markers were parameterized in a differentiated manner for long-range and close-range docking scenarios, respectively.

During long-range guidance, an actively illuminated point-light array is used as the primary visual landmark. Blue–green LED point sources are arranged into a rectangular or circular array with a fixed geometric topology [20]. The blue–green spectrum is exploited for its favorable underwater propagation to increase the effective observation range, while different long-range operating conditions are accommodated by tuning the overall landmark size (l) and the emitting radius (r) of each point, as illustrated in Figure 3. This design supports reliable detection and localization for a wide field of view and severe attenuation. The marker configuration comprises four points: three of uniform color and one of a distinct color, the latter serving as the reference point for location.

During close-range precision docking, point-light landmarks tend to suffer from field-of-view overflow, saturation, and insufficient feature availability in extremely near-field imaging. In addition, the recognition performance of the long-range QR-code landmark degrades sharply under occlusion conditions or when the code appears too small. Therefore, a QR-code landmark is adopted as the primary close-range visual target. QR-code textures with different physical sizes and encoded IDs are generated programmatically and attached to the surface of the Box node, forming a planar landmark with high feature density and flexible sizing, as illustrated in Figure 4. Consequently, stable corner and edge cues can be provided under close-range, small-FOV conditions, enabling high-accuracy pose estimation.

By differentially modeling a remote point light source and a near-field QR code in terms of size parameters, spatial layout, and optical characteristics, the simulation platform can systematically reproduce distance-dependent changes in marker visibility and algorithm applicability in two representative scenarios—long-range guidance and close-range precision docking. This provides a fine-grained, tunable simulation basis for examining the performance boundaries and robustness of vision-based docking algorithms across different distance ranges.

2.2. Underwater Visual Environment Simulation

2.2.1. Turbidity and Attenuation

In real underwater environments, light propagation is highly complex. However, for USB-camera-centric underwater visual docking, visual localization is primarily concerned with the target’s image-level visibility and contrast degradation, rather than a precise quantitative characterization of the background light field [21]. Accordingly, a depth- and turbidity-dependent first-order light-intensity attenuation term was incorporated on top of the simulator’s original illumination and fogging models so that the overall imaging behavior follows the basic laws of underwater light attenuation [22]. This simplified approximation is intended to qualitatively reproduce visibility loss and image-degradation trends while avoiding high-complexity optical simulation. As a result, the consistency between simulated scenes and real underwater imaging characteristics is improved, providing a sufficiently credible and engineering-feasible optical environment for robustness evaluation of visual docking localization algorithms.

The fog node in Webots is designed for atmospheric haze simulation; its optical model does not match underwater media and thus cannot accurately reflect underwater light-attenuation behavior. Moreover, its default grayish tone fails to capture turbidity-related underwater hues (e.g., yellow-brown or green). To address these limitations, key modifications were made to the Webots fog node. The Beer–Lambert law of underwater optics was incorporated into the model, with the turbidity coefficient

β

defined to quantify the scattering and absorption effects of water on light [23]. The modified color mixing weight formula is expressed as:

ω = 1 - e^{- β \cdot d}

(1)

where d represents the distance from the visual sensor to the target, and

β

is the attenuation coefficient related to turbidity (

β

> 0). This coefficient can be adaptively adjusted based on actual water quality parameters, enabling variable simulation effects under different turbidity conditions.

Meanwhile, to better approximate underwater visual conditions, the fog color was changed from the default grayish white to a greenish tone consistent with the appearance of typical eutrophic waters. By jointly adjusting the scattering coefficient

β

and the color parameters, the proposed model was able to simulate underwater environments across different turbidity levels and water-quality conditions, generating images that exhibit realistic light attenuation and color degradation [24,25]. These simulated data were used to support the development and testing of AUV visual docking algorithms. The default airborne fog rendering is shown in Figure 5a, and the observation after underwater turbidity enhancement is shown in Figure 5b. The comparison demonstrates that the modified approach yields more pronounced attenuation and scattering-induced interference, resulting in a visual appearance closer to that of highly turbid underwater environments. In addition, whereas the blur level of the original Webots fog node is largely fixed, a tunable turbidity coefficient was introduced to enable flexible adaptation to multiple turbidity grades, thereby substantially improving both realism and configurability.

Natural light in underwater environments attenuates markedly and continuously with increasing depth, resulting in pronounced differences in visual characteristics between shallow and deep regions. Existing simulation approaches (e.g., the TexturedBackgroundLight node in Webots) typically model illumination in a binary on/off fashion and therefore cannot reproduce the physical process of continuous, depth-dependent attenuation, which limits the realism of underwater visual simulation. To address this limitation, this paper proposes a depth-informed illumination attenuation simulation method. Specifically, an attenuation model linking depth to illumination intensity is established, together with an image post-processing function that dynamically renders the gradual dimming of illumination as depth increases. The proposed method preserves visual continuity while improving the physical consistency and adaptability of the simulated scene [23]. The attenuation of light intensity I with depth d is described by Formula (2).

I_{d} = I_{0} e^{- k \cdot d}

(2)

where

I_{0}

represents the light intensity at the water surface, and k denotes the water attenuation coefficient.

The above attenuation relationship was directly applied to the RGB pixel values of the image. For an image captured at depth d, the original value

P_{o r i g}

of each pixel channel was attenuated proportionally, as shown in Formula (3):

P_{n e w} = P_{o r i g} \frac{I_{d}}{I_{0}} = P_{o r i g} e^{- k \cdot d}

(3)

Using the above formulation, pixel-wise channel computations were performed to induce a depth-dependent decay in image brightness, thereby emulating imaging differences caused by illumination changes at varying water depths. With a representative attenuation coefficient of (k = 0.2), simulations were carried out for (d = 0) (surface), (d = 1 m), and (d = 5 m), and the results are presented in Figure 6a–c. As depth increased, overall brightness was significantly reduced, and fine details were progressively degraded, consistent with the physics of underwater optical attenuation.

2.2.2. Perturbation

Beyond the direct impact of optical properties on imaging quality, underwater visual localization is also critically dependent on the camera’s spatial position and orientation in the water. The camera pose is perturbed by flow-induced disturbances to AUV motion, while simultaneously determining whether the viewing direction and the observation point are occluded by obstacles. Therefore, targeted simulation of the non-optical environment is necessary, accounting for both hydrodynamic disturbances and obstacle-induced occlusions.

To highlight the impact of an AUV’s motion state on vision-based localization in visual docking tasks, underwater non-optical properties were modeled using a strategy that centered on motion effects while moderately approximating fluid details [26]. In the motion simulation, high-fidelity reproduction of complex hydrodynamic phenomena (e.g., viscous micro-effects and wake dynamics) was not pursued. Instead, key parameters such as water density and background flow velocity were simplified via the Webots Fluid node, as shown in Table 3, and the effects of buoyancy, drag, and flow-field disturbances were equivalently represented as external forces and disturbances acting on the AUV, with emphasis on their macroscopic influence on translational and rotational motion. Coupled with thruster output control, the simulation produced trajectories and attitude variations comparable to those in real underwater environments, thereby providing a sufficiently realistic motion basis for evaluating the robustness of visual docking algorithms with flow disturbances and trajectory deviations, without introducing computationally expensive high-resolution fluid simulation.

2.2.3. Occlusion

Regarding environmental spatial structure, obstacle and occlusion modeling was employed to characterize the indirect disturbance of non-optical factors on visual perception. In the simulated scenes, large fixed obstacles together with small randomly distributed obstacles were deployed to construct multi-scale, multi-directional occlusion sources so that the docking target exhibited representative partial-occlusion conditions as the AUV moved and the viewpoint changed. Obstacle size, shape, location, and deployment pattern were parameterized and controllable.

To characterize the spatiotemporal distribution of small-volume random obstacles, the evolution of their number, size, and position is parameterized and described analytically. Let the simulation domain be a bounded three-dimensional space:

Ω = [x_{min}, x_{max}] \times [y_{min}, y_{max}] \times [z_{min}, z_{max}]

(4)

And denote the position vector of the i-th small-volume obstacle at time t as

p_{i} (t) = {[x_{i} (t), y_{i} (t), z_{i} (t)]}^{T}

(5)

(1): Cardinality constraints and random generation

The number of small-volume obstacles is subject to an upper-bound constraint:

1 \leq N_{obs} \leq N_{max}

(6)

where

N_{obs}

denotes the current number of obstacles and

N_{max}

is a prescribed maximum used to regulate scene occlusion complexity.

At initialization, the obstacle count is randomly generated according to a specified discrete distribution:

N_{obs} \sim D_{N} (λ_{N})

(7)

where

D_{N}

characterizes the statistics of obstacle counts and

λ_{N}

is a parameter related to the expected number. In practice,

D_{N}

may be instantiated as a Poisson distribution or a discrete uniform distribution to accommodate different scene-density configurations.

(2): Size constraints

To emphasize the “small-volume” property, the characteristic size of each obstacle is restricted to a prescribed interval:

r_{min} \leq r_{i} \leq r_{max}, i = 1, \dots, N_{obs}

(8)

where

r_{i}

denotes the equivalent radius or characteristic scale of the i-th obstacle. The bounds

[r_{min}, r_{max}]

are determined jointly by experimental requirements and the scale of the docking apparatus, ensuring that obstacles primarily induce local occlusions rather than global occlusions.

(3): Spatial distribution and update rule

For static random deployment, the initial positions follow a three-dimensional uniform distribution:

p_{i} (0) \sim U (Ω), i = 1, \dots, N_{obs}

(9)

where

U (Ω)

denotes the continuous uniform distribution over

Ω

.

To describe a time-evolving random occlusion process, obstacle positions are updated according to

p_{i} (t + Δ t) = p_{i} (t) + v_{flow} (p_{i} (t), t) Δ t + η_{i} (t)

(10)

where

v_{flow}

is the background flow velocity field and

Δ t

is the simulation time step. The perturbation term

η_{i} (t)

captures small-scale random motion and is modeled as a zero-mean Gaussian random process:

η_{i} (t) \sim N (0, Σ_{η})

(11)

where

Σ_{η}

is the covariance matrix controlling the intensity and anisotropy of the stochastic drift.

In modeling occlusions caused by small-scale suspended particles, debris, and other obstacles, this paper adopts a statistical occlusion simulation method based on a Poisson point process. A large number of small obstacles that are difficult to model explicitly in 3D space are approximated as a set of randomly distributed points within the camera frustum, and the corresponding occlusion blobs are generated directly in the image domain [27]. Specifically, let the total volume of the obstacle activity space be

V_{env}

and the total number of obstacles be

N_{total}

, then the mean spatial density is

ρ = \frac{N_{total}}{V_{env}} .

For the effective volume

V_{cam}

associated with the current camera frustum, the number of visible obstacles within the field of view is modeled as a Poisson random variable:

N_{cam} \sim Poisson (λ), λ = ρ V_{cam}

(12)

thereby characterizing, in a statistical sense, the spatially random and uniform distribution of obstacles in

3 D

space. For each sampled obstacle, its

3 D

position

(X, Y, Z)

and physical scale r are generated in the camera coordinate system, and the mapping from space to the image plane is completed via the pinhole camera model, yielding pixel coordinates:

u = f_{x} \frac{X}{Z} + c_{x} v = f_{y} \frac{Y}{Z} + c_{y}

(13)

while its physical radius is converted into an imaging radius:

r_{pix} \approx \frac{f_{x} + f_{y}}{2} \cdot \frac{r}{Z}

(14)

On this basis,

N_{cam}

circular or elliptical occlusion blobs are overlaid directly onto the captured image to simulate the occlusion effects of random small obstacles on visual markers. This approach avoids explicitly creating and rendering massive numbers of geometric primitives in the simulation environment while maintaining spatial consistency and controllability of occlusions in terms of imaging geometry, thereby providing an efficient means for evaluating imaging characteristics of visual docking algorithms at varying occlusion intensities.

Together, the above cardinality constraint, size constraint, and position update model provide a unified framework for generating small-volume random obstacle scenarios with controlled quantity, bounded scale, and randomized locations, thereby enabling adjustable and repeatable simulation conditions for evaluating the robustness of underwater vision-based docking localization algorithms in diverse local-occlusion settings.

By integrating models of fluid-induced disturbances and obstacle occlusions, the simulation platform reproduces—without sacrificing computational efficiency—the principal non-optical factors that govern camera pose and target visibility. This provides controllable, repeatable validation conditions for robustness analysis and comparative evaluation of underwater vision-based docking and localization algorithms in complex environments.

2.3. Visual Docking Location Simulation of AUV

2.3.1. Visual Docking Location Task

For AUV visual docking localization, this study develops a complete and extensible visual localization pipeline to satisfy pose-estimation requirements with challenging underwater illumination and turbidity, multimodal docking markers, and varying camera configurations. Visual docking localization is essentially performed by acquiring images of the docking marker with an AUV-mounted camera and estimating the marker pose relative to the camera through geometric solving. As shown in Figure 7, the camera frame is denoted by

(x_{c}, y_{c}, z_{c})

and the marker frame by

(x_{g}, y_{g}, z_{g})

. Four feature points on the marker,

(q_{1}, q_{2}, q_{3}, q_{4})

, are selected, and their corresponding image projections are

(p_{1}, p_{2}, p_{3}, p_{4})

. Under the camera projection model, the task is formulated as estimating the rigid transformation from the marker frame to the camera frame, represented by a rotation matrix (R) and a translation vector (T). In practice, the attainable localization accuracy depends critically on reliable feature imaging as well as accurate extraction and robust matching of key marker features in complex underwater conditions.

Building on the above task definition, this study constructed a unified vision-based docking localization pipeline for complex underwater environments [28], as illustrated in Figure 8. First, raw images captured by an AUV-mounted camera were preprocessed, including denoising and contrast/brightness enhancement, to alleviate the adverse effects of light scattering and energy attenuation on subsequent feature extraction and matching under varying illumination, turbidity, and depth-dependent degradation conditions. Next, target recognition was performed for multiple visual markers (e.g., LED arrays and QR codes) in occlusion scenarios jointly induced by large fixed obstacles and small random objects; marker keypoints or structural cues were extracted to enable robust detection and tracking under partial occlusion conditions [29,30]. Finally, by combining image-plane observations of marker feature points with their prior geometric relationships defined in the marker coordinate frame, a Perspective-n-Point (PnP) algorithm was used to solve the rigid-body transformation of the marker frame relative to the camera frame, i.e., the translation T and rotation R, thereby completing relative pose estimation.

At the implementation level, an AUV visual docking scenario was built on the Webots simulation platform. Real-time image streams were captured from a virtual camera via a Python 3.12.3 controller, and libraries such as OpenCV were used in Python to integrate and jointly debug the modules for image preprocessing, target recognition, and PnP-based pose estimation [31]. By integrating “Webots simulation + Python controller + OpenCV image processing + PnP pose estimation,” the proposed underwater vision-based docking localization algorithm was validated and comparatively evaluated under controllable and repeatable conditions, demonstrating good engineering portability and solid support for academic research [32].

The AUV is abstracted as a controllable mobile platform carrying a camera in the world frame

Σ_{W}

. The camera observation trajectory is represented as a pose sequence:

{\{(p_{C} (t_{k}), q_{C} (t_{k}))\}}_{k = 0}^{N}

(15)

where

p_{C} (t_{k}) = [\begin{matrix} x (t_{k}) \\ y (t_{k}) \\ z (t_{k}) \end{matrix}]

denotes the camera position and

q_{C} (t_{k})

denotes the camera attitude. Typical trajectories are generated in a parametric manner, e.g., a linear approach trajectory for multi-range observations,

p_{C} (t) = p_{0} + v t d, 0 \leq t \leq T

, and a circling trajectory for multi-aspect observations of the marker,

p_{C} (θ) = [\begin{matrix} x_{T} \\ y_{T} \\ z_{T} \end{matrix}] + [\begin{matrix} R cos θ \\ R sin θ \\ h \end{matrix}], θ \in [0, 2 π] .

By adjusting

p_{0}

,

d

, R, h and the sampling step size, a set of viewpoints covering multiple ranges, azimuths, and heights around the target can be constructed. The AUV pose in

Σ_{W}

is described by:

p_{B} (t) = [\begin{matrix} x (t) \\ y (t) \\ z (t) \end{matrix}], η (t) = [\begin{matrix} ϕ (t) \\ θ (t) \\ ψ (t) \end{matrix}]

(16)

And the rigid-body transformation matrix is:

T_{W B} (t) = [\begin{matrix} R (η (t)) & p_{B} (t) \\ 0^{T} & 1 \end{matrix}]

(17)

A fixed rigid-body transformation is assumed between the camera and the AUV body:

T_{B C} = [\begin{matrix} R_{B C} & t_{B C} \\ 0^{T} & 1 \end{matrix}]

(18)

Thus the camera extrinsics with respect to the world frame are given by:

T_{W C} (t) = T_{W B} (t) T_{B C}

(19)

Built upon the Webots simulation platform, by controlling and recording the above trajectories and poses, image data with precise extrinsic “ground truth” can be systematically generated at varying ranges, viewing directions, and attitudes, in conjunction with diverse settings of illumination, turbidity, and occlusion. This dataset is used for quantitative evaluation of the accuracy and robustness of vision-based docking localization algorithms under multiple operating conditions, thereby positioning the platform explicitly as a visual-docking simulation environment emphasizing “localization performance evaluation”, rather than a simulator of the complete docking control process [33,34].

2.3.2. Visual Docking Location Evaluation Method

Visual Docking Performance was evaluated at both single-run and repeated-trial statistical levels using four core metrics: successful visual recognition rate, absolute localization error, relative localization error, and the mean successful recognition rate and mean visual localization error over repeated trials [35].

In the Webots simulation environment, the ground-truth distance between the AUV and the docking target is directly provided by the simulator and is denoted as

d_{true} (k)

; k corresponds to the sampling time associated with the k-th image frame. Using the monocular vision-based docking localization algorithm, the k-th image frame is processed to obtain the corresponding vision-estimated distance, denoted as

d_{est} (k)

. To focus on the effective visual guidance interval during docking, only frames satisfying

0 m \leq d_{true} (k) \leq 10 m

are included in the evaluation; frames captured at distances greater than

10 m

are excluded from the statistics.

In each experiment, two error metrics are defined for all image frames that satisfy

0 \leq d_{true} (k) \leq 10 m

and for which the distance estimation is successfully produced. The absolute error is defined as the distance discrepancy in the sense of the

l_{2}

norm. Since the distance is a scalar, it reduces to the absolute difference:

e_{abs} (k) = |d_{true} (k) - d_{est} (k)|

(20)

The relative error is obtained by normalizing the absolute error by the corresponding ground-truth distance:

e_{rel} (k) = \frac{e_{abs} (k)}{|d_{true} (k)|} = \frac{|d_{true} (k) - d_{est} (k)|}{|d_{true} (k)|}

(21)

In subsequent error analysis and visualization,

d_{true}

is used as the horizontal axis (x-axis), while

e_{abs}

and

e_{rel}

are plotted on the vertical axis (y-axis) to characterize the distribution and variation trends of localization errors at different docking distances.

During a complete docking process, the vision system is sampled and solved at a fixed frequency from the moment when the AUV first enters the effective docking-distance interval

[0, 10] m

until docking is completed or the experiment is terminated.

Considering the AUV motion characteristics and the computational limitations of image processing, the image acquisition and processing frequency is set to

f_{cam} = 10 Hz

, i.e., 10 frames are captured and processed per second.

In a given experiment, suppose that within the effective range of 0–

10 m

, a total of

N_{total}

frames are captured and processed, among which

N_{succ}

frames successfully yield a valid vision-based distance estimate (i.e., the algorithm outputs a valid

d_{est} (k)

). The visual success rate for this experiment is then defined as:

P_{succ} = \frac{N_{succ}}{N_{total}}

(22)

where

N_{total}

is the total number of frames within the effective distance interval 0–

10 m

included in the evaluation;

N_{succ}

is the number of frames for which distance estimation is successfully obtained.

Localization errors are computed only over the frames that successfully produce distance estimates (a total of

N_{succ}

frames). Since the sampling frequency is fixed and the time interval between consecutive frames is constant, all frames are equally weighted when averaging. Let

S

denote the index set of all successfully recognized frames. The mean absolute error and mean relative error for a single experiment are defined as follows:

\begin{matrix} {\bar{e}}_{abs} = \frac{1}{N_{succ}} \sum_{k \in S} e_{abs} (k) = \frac{1}{N_{succ}} \sum_{k \in S} |d_{true} (k) - d_{est} (k)| \\ {\bar{e}}_{rel} = \frac{1}{N_{succ}} \sum_{k \in S} e_{rel} (k) = \frac{1}{N_{succ}} \sum_{k \in S} \frac{|d_{true} (k) - d_{est} (k)|}{|d_{true} (k)|} \end{matrix}

(23)

In comparisons across different operating conditions and experimental settings in this study, the mean relative error

{\bar{e}}_{rel}

is primarily adopted as the core indicator of localization accuracy.

Due to the randomness introduced by trajectory perturbations, flow disturbances, and stochastic obstacle deployment in a single experiment, to improve the statistical reliability of the evaluation, each configuration of environmental parameters was tested through M independent repeated experiments. For the i-th experiment (

i = 1, 2, \dots, M

), the visual success rate and the mean relative error are obtained as

P_{succ}^{(i)}

and

{\bar{e}}_{rel}^{(i)}

, respectively.

Based on these results, the mean visual success rate and the mean visual relative localization error under the given environmental condition are defined as follows:

\begin{matrix} {\bar{P}}_{succ} = \frac{1}{M} \sum_{i = 1}^{M} P_{succ}^{(i)} \\ {\bar{E}}_{rel} = \frac{1}{M} \sum_{i = 1}^{M} {\bar{e}}_{rel}^{(i)} \end{matrix}

(24)

Together, these metrics—namely, the single-experiment visual success rate

P_{succ}

, the mean absolute error

{\bar{e}}_{abs}

, the mean relative error

{\bar{e}}_{rel}

, and the repeated-experiment averages

{\bar{P}}_{succ}

and

{\bar{E}}_{rel}

—form a comprehensive evaluation framework for the vision-based docking localization task. This framework not only assesses whether the algorithm can stably detect the target and estimate distance under varying illumination, turbidity, occlusion, and flow-speed conditions but also quantifies how accurate the distance estimates are given successful detections while reflecting overall robustness and statistical performance in complex underwater environments through repeated trials.

3. Results and Discussion

Progressive validation of an underwater visual docking localization simulation platform was performed across three levels: AUV motion, environment/landmarks, and task-level docking. At the AUV level, the simulated camera intrinsics were calibrated and benchmarked against those of the physical underwater camera to ensure consistency in imaging geometry, and an approximate motion model was used to control AUV attitude and translation so that the resulting trajectories matched typical docking maneuvers. At the environment and landmark level, the imaging behaviors of representative markers (e.g., point light sources and QR codes) were analyzed under varying turbidity, light attenuation, and occlusion conditions, confirming that key underwater-vision phenomena—signal attenuation, contrast degradation, and partial/complete target occlusion—were reproduced. On this basis, a classical vision-based localization method was used to systematically test the full visual docking process under four representative operating conditions (turbidity variation, attenuation variation, disturbance-speed variation, and occlusion variation), and the condition-dependent trends of localization accuracy and target recognition success were compared between simulation and real underwater docking scenarios. The results demonstrated that the performance variation laws observed in simulation were highly consistent with empirical trends from practical underwater visual docking, thereby providing strong behavioral evidence for the realism and credibility of the proposed simulation environment.

3.1. Simulation Validation of AUV

3.1.1. Validation of AUV Maneuvering Capability

In underwater vision-based docking and localization, the fidelity with which the simulated AUV platform reproduces the camera motion encountered in real operations directly determines the credibility of subsequent algorithmic evaluations. During docking, the camera’s spatial trajectory, body attitude, and viewing-angle variations correspond, respectively, to the camera position, motion speed, and observation pose in the localization problem. Accordingly, emphasis is placed not on how the motion is generated, but on whether the simulation carrier can reliably realize the motion outcomes required for docking scenarios along three essential dimensions: vertical displacement capability through depth variation, along-track progression capability through speed variation, and viewpoint/line-of-sight adjustment capability through heading variation. By covering these motion outcomes, the platform can reproduce representative sets of camera trajectories and pose changes that are typical of docking operations, thereby providing a motion foundation for systematic evaluation of vision-based docking localization algorithms under diverse trajectories and viewing conditions.

On this basis, the motion performance of the platform was examined in Webots. Ground-truth pose sequences were obtained by directly reading and converting the position and attitude variables provided at the Webots system level, yielding accurate reference values for quantitative assessment of visual localization algorithms. The step-change test curves in Figure 9 indicate that the platform exhibits clear and repeatable motion behavior in all three dimensions. For depth variation, the vehicle reaches the target depth rapidly and maintains a stable vertical position thereafter. For speed variation, velocity changes remain continuous and consistent across trials, supporting steady progression and displacement along the docking path. For heading variation, commanded heading changes are distinctly realized and subsequently held, ensuring that the camera viewing direction and observation angle evolve as intended. Compared with conventional simulations that provide only images and coarse trajectories, the present platform supplies high-precision, system-consistent ground-truth position and orientation data. The response curves further substantiate that the achievable depth, speed, and heading variations satisfy the operational requirements of the camera as a visual sensing device in docking, thereby enabling a quantifiable and repeatable benchmark for evaluating underwater vision-based docking localization algorithms.

3.1.2. Validation of Camera Carried by AUV

In AUV visual docking simulation, the credibility of the results hinges on whether the camera model reproduces geometric imaging behavior consistent with real engineering systems. Accordingly, the intrinsic-parameter configuration of the Webots camera is calibrated and validated to ensure agreement with a practical engineering camera in key parameters, including focal length, principal-point location, resolution, and field of view. This consistency supports engineering-level comparability in the distribution of feature points on the image plane and in the mapping between image measurements and spatial geometry. Because distortion in real underwater cameras is diverse, strongly lens-dependent, and typically compensated through calibration and undistortion in practice, complex distortion models are not explicitly constructed in the simulation. Instead, Webots-generated images are treated as outputs after distortion correction, and the modeling focuses on the general intrinsic geometry that most strongly affects visual-docking localization. This choice aligns with the engineering workflow of correcting distortion prior to localization and preserves the platform’s generality for comparative evaluation of different visual localization algorithms.

Building on the above, the classical Zhang calibration method was adopted to systematically calibrate the intrinsic parameters of the simulated camera and to analyze calibration errors, thereby validating the rationality of the proposed modeling assumptions. In Webots, a black–white chessboard calibration target was constructed in Figure 10. Multiple images were acquired by varying the relative pose between the camera and the target, and the intrinsic parameters (fx, fy, cx, cy) were estimated using Zhang’s method. The intrinsic ground-truth values were predefined in simulation, and five independent calibration trials were conducted at each of three resolutions (1080p, 720p, and 480p); the mean relative error for each intrinsic parameter was then computed. As shown in Table 4, the mean relative errors of all intrinsic parameters were below 1% across all resolutions. These results indicated that, within the Webots simulation environment, an engineering-consistent calibration workflow could stably recover high-accuracy camera intrinsics, and that the geometric imaging characteristics of the simulated camera were highly consistent with those of a real camera. This consistency ensured that subsequent simulation-based tests of the visual docking and localization algorithm were engineering-relevant and comparable in terms of feature-point acquisition and error levels.

3.2. Simulation Validation of Environment

3.2.1. Effects of Turbidity and Attenuation on Visual Marker Recognition

Because visual localization prioritized stable and repeatable target position and pose estimation rather than an exact reproduction of the microscopic physics of underwater light transport, the optical modeling in this study emphasized consistency in macroscopic degradation trends. Specifically, as turbidity and water depth varied across a wide range, the simulated changes in image brightness, contrast, and target distinguishability were expected to follow the same direction of change—and be of comparable magnitude—as those observed in real underwater environments. This objective matched the design intent of the proposed turbidity–depth attenuation model, which aimed to generate representative degradation samples spanning large parameter ranges for image preprocessing and robustness evaluation, rather than to achieve high-fidelity optical simulation under a single operating condition. Accordingly, visualization and recognition experiments were conducted to verify that, under different turbidity and depth settings, the simulated imaging degradation of visual markers obeyed the practical pattern that higher turbidity and longer optical paths lead to darker targets, blurrier contours, and more challenging recognition.

Building upon the above, a unified underwater scenario was constructed in Webots to validate the optical imaging and recognition performance of two representative visual markers: an LED point-light marker and a QR-code marker. The LED marker was configured with l = 0.5 m and an emitting radius of r = 0.08 m, while the QR-code marker adopted the configuration ID = 3 with a side length of 0.2 m. These settings represent point-feature and planar-feature targets, respectively. Marker parameters were held constant throughout the experiments, and environmental parameters were varied to systematically assess how optical degradation influences marker visibility and recognition difficulty.

For the environment, the turbidity parameter

β

and the underwater light attenuation parameter k were discretized and combined—guided by typical ranges of water attenuation coefficients—to cover representative operating conditions from clear near-surface water to moderately turbid mid-depth water.

β

characterizes the bulk scattering and absorption strength of the medium, with a theoretically feasible range of approximately 0–1.5 m⁻¹; smaller

β

indicates clearer water and larger

β

indicates higher turbidity. To span conditions from (idealized) zero turbidity to moderate turbidity,

β

was set to 0, 0.2, and 0.5, corresponding to a reference baseline, mildly turbid nearshore or slightly disturbed waters, and moderately turbid waters with substantially reduced visibility. In addition, k was introduced to model the exponential decay of light intensity with depth d (typical range

\approx 0 - 1

) and was set to 0, 0.3, and 0.7, representing a near-surface baseline with negligible additional attenuation (equivalent

d = 0

m), a moderate-attenuation condition approximating

d \approx 1

m (shallow-water or near-field operations), and a strong-attenuation condition approximating

d \approx 5

m to emulate dimmer mid-depth environments with reduced contrast.

For the point-light-source marker observed at 5 m, increasing turbidity (from clear to moderate) together with increasing water depth (0–5 m) resulted in a pronounced reduction in apparent brightness and a progressive loss of edge definition of the luminous spot; ultimately, only coarse localization was possible and a well-defined boundary could not be resolved, as illustrated in Figure 11. For the QR-code marker observed at 1 m, a monotonic decrease in black–white contrast and gradual blurring of the internal pattern were observed as turbidity and depth increased; under moderate turbidity conditions at 5 m depth, the module boundaries became difficult to distinguish, leading to a substantial decline in recognition performance, as shown in Figure 12. In summary, both point-type and planar markers exhibited the same degradation tendency under these conditions, with environmental deterioration causing brightness attenuation, reduced contrast, and increased recognition difficulty.

Taken together, the experimental results show that the proposed underwater optical-environment simulator reproduces—over a broad range of settings—the way turbidity and water depth degrade the imaging quality of both point and planar visual landmarks and that its overall trends match those observed in real underwater conditions. This supports the validity of the turbidity and light-attenuation modeling adopted here and suggests that the simulator can supply reliable, controllable degradation samples for designing image-preprocessing steps and evaluating robustness in underwater visual localization, without resorting to detailed physical-optics solvers.

3.2.2. AUV Disturbance Rejection Capability Validation

Variations in fluid properties were treated as a principal external disturbance affecting AUV attitude stability and the accuracy of vision-based docking. To emphasize the AUV’s motion-level disturbance-rejection capability and to verify that the simulation environment could faithfully capture this process, a station-keeping scenario was implemented: the AUV was constrained to maintain a prescribed pose in three-dimensional space, and compensatory thrust was actively produced by omnidirectional thrusters in response to changes in the ambient flow field or fluid properties. Because the camera was rigidly mounted on the vehicle, its spatial pose was entirely determined by the AUV’s overall motion; consequently, the AUV’s ability to maintain stable motion under disturbance conditions was directly manifested as camera-pose stability during the vision-based docking task.

In the simulation tests, flow disturbances of varying directions and intensities were imposed, and the fluid-property parameters were adjusted to examine the response characteristics of the vertical thrusters, lateral thrusters, and the main propulsor, as shown in Table 5. The results showed that both vertical disturbances that induced surfacing/sinking tendencies and horizontal disturbances that drove fore–aft or left–right deviations were counteracted by compensatory thrust generated in the direction opposite to the disturbance. Furthermore, variations in disturbance intensity were accompanied by corresponding, physically reasonable changes in thruster output. This qualitative relationship was consistent with the motion response of an AUV subject to flow-field interference in real underwater environments, suggesting that the simulated AUV exhibited engineering-realistic disturbance-rejection capability and could maintain relatively stable platform attitude and camera pose under perturbation conditions. The observations also corroborated the fidelity and effectiveness of the disturbance modeling in the simulation environment, supporting its use for credible evaluation of camera motion and steady-state characteristics in vision-based docking localization tasks under flow disturbance conditions.

3.2.3. Effects of Obstacle Occlusion on Visual Marker Detection Performance

In the occlusion-impact experiments, within a simulated water region of volume

600 m^{3}

, the above Poisson-distributed occlusion model is used to progressively increase the number of small-volume obstacles. Three density levels are considered by setting

N_{total} = 600

, 6000, and 60,000, and the imaging performance of point-light markers and QR-code markers is compared, as shown in Figure 13.

As can be seen in Figure 13, the results indicate that when the number of obstacles is 600, occlusion blobs in the field of view are relatively sparse and in most cases appear only in background regions, exerting limited influence on the contours and details of both point-light and QR-code markers; target edges and internal structures remain largely clear and discernible. When the obstacle number increases to 6000, visible suspended particles increase markedly; more frequent local occluding highlights appear around the point-light marker, and portions of QR-code modules are occluded, leading to an evident loss of overall texture completeness. When the obstacle number further increases to 60,000, the image exhibits a strongly occluded state dominated by high-density particles; the light-spot contour of the point-light marker is often interrupted by occlusion blobs, with pronounced discontinuities in brightness distribution and even local disappearance, while the QR pattern shows large-area block occlusions and structural fragmentation, with blurred or even barely recognizable boundaries between the original black-and-white modules. The comparison between the two marker types suggests that, due to its smaller size and highly concentrated information, the QR code is more sensitive to local occlusions, and its pattern integrity degrades rapidly with medium-to-high obstacle densities; by contrast, the point-light marker has a sparser distribution of feature points and can still retain a relatively clear overall geometric layout under moderate occlusion conditions. Overall, the progressive increase in obstacle number significantly alters the imaging clarity and structural integrity of the markers, providing a controllable imaging scenario for subsequent analysis of recognition performance and robustness differences among visual algorithms under multi-level occlusion conditions.

3.3. AUV Visual Docking Location Simulation Experiment

3.3.1. Visual Docking Location Experiment Workflow

To validate the applicability of the developed Webots-based AUV visual docking simulation platform to underwater visual localization modeling and evaluation, an underwater docking simulation experiment covering the complete visual localization pipeline was designed and implemented. The experiments were conducted in a pool-style simulated environment measuring 15 m (length) × 5 m (width) × 8 m (depth). The scene included a simulated AUV equipped with a monocular vision sensor and a docking target system composed of multiple types of visual markers [36]. Key environmental factors—illumination, turbidity, and occlusion—were parameterized to generate operating conditions with different visibility ranges and imaging qualities, under which the AUV performed a continuous visual localization task from the long-range approach to close-range docking [14]. Following a typical AUV visual docking workflow, the vision-dominated docking procedure was divided into four consecutive stages, as shown in Figure 14, focusing on the availability of visual cues at each stage, the resulting pose-estimation accuracy, and how localization performance supported the process:

(1) Initial pose adjustment and docking initiation: Based on the predefined trajectory and initial navigation information, the AUV adjusted its attitude and heading toward the docking area, such that the target was about to enter the camera’s observable range and basic observability was established for subsequent visual takeover.

(2) Long-range coarse visual localization using LED point lights: Once the point lights appeared in the camera view with clearly discernible contours, the target’s approximate direction and coarse relative bearing were inferred through contour detection and analysis of their image-plane distribution, and the stand-off distance was reduced to a range suitable for recognizing near-field markers.

(3) Short-range accurate pose estimation using QR-code markers: When the QR-code markers became clearly visible, their geometric structures were leveraged to solve the monocular relative pose, progressively reducing the pose error between the AUV and the docking device to enable fine approach.

(4) Final docking and relative-distance convergence: The AUV approached the docking device at low speed, while the remaining translational and attitudinal errors were driven to converge, with attention restricted to the terminal process in which the relative pose and distance approached zero, in order to assess whether monocular visual localization could guarantee relative distance and attitude accuracy in the final docking stage.

In addition, to facilitate a comparison and validation of the simulation results, an ideal indoor experimental setup was constructed, and images of a physical QR code and an LED point-light landmark were acquired, as shown in Figure 15. The real images were incorporated primarily to examine whether the simulated imaging characteristics of the QR code and the point light source agree with those observed in practice, thereby indicating that the simulated visual observation sequence can faithfully represent the imaging features encountered during actual AUV docking.

Figure 14. AUV visual docking process.

Figure 15. The indoor visual docking process of an AUV.

To ensure consistency between simulation and experiment, the same vision platform and camera configuration were adopted in the physical tests as in the simulation environment. In particular, the onboard camera was set to a resolution of 640 × 480 pixels, identical to the simulated camera model, which is crucial for the visual localization task so that the image scale, field of view, and pixel-level measurement characteristics remain consistent. Moreover, the physical QR code and point-light landmarks were fabricated with exactly the same geometric dimensions as those defined in the simulation. With this design, the comparison between simulated and real underwater experiments was performed under strictly matched conditions, enabling us to directly assess whether the camera images acquired in the real underwater environment exhibit visual features and geometric cues that are consistent with those generated in the simulation.

3.3.2. Emulating Practical AUV Docking Visual Sequences

To emulate the characteristics of the visual observation sequence in practical AUV docking, six representative docking poses were arranged along the docking axis at 2 m intervals, and the corresponding simulated image sequence was obtained, as shown in Figure 16. In the simulation, the docking distance was defined as the Euclidean distance from the AUV camera optical center to the plane containing the docking-device markers [37]. The geometric parameters of the visual markers were set as follows: LED point-light marker length (

l = 0.6 m

), emitting-region radius (

r = 0.04 m

), and QR-code marker-block side length (

0.15 m

).

Based on the above marker scale and the optical imaging conditions, two typical observation regimes (long-range and short-range) were investigated, with a docking distance of 5 m used as the boundary between them.

Figure 17 shows the visual docking acquisition results obtained with physical fiducial markers in an ideal indoor environment. Overall, the measured imaging characteristics of both the point light source and the QR code were in close agreement with the simulation results: the light-spot morphology and spatial distribution were well reproduced, and the QR code exhibited comparable edge definition and textural structure. A side-by-side comparison further indicated that, with the scale parameters and imaging settings held constant, the simulated “outdoor” docking visual scene produced imaging effects that were largely consistent with those observed in the ideal indoor condition [38,39]. In particular, the distance-dependent trends of the light-spot size and brightness, as well as the changes in QR-code recognizability and geometric deformation with viewing range, followed the same patterns in simulation and experiment. These results support the validity of the proposed simulation-based visual observation model for representing the imaging behavior of the point light source and QR code during AUV docking, and they underpin subsequent algorithm design and performance evaluation based on simulated data.

When the docking distance exceeded 10 m, the marker’s global outline was no longer clearly discernible because of underwater light attenuation and turbidity-induced scattering; only a faint, low-contrast target region remained visible, indicating that the system was operating near the recognition limit with the current marker scale and optical configuration. In the far-range interval of 6–10 m, QR-code texture information could not yet be resolved, whereas the LED point sources were comparatively salient against the background. The corresponding light spots exhibited sharp contours and a stable spatial arrangement, which provided the dominant visual cues for point-feature-based coarse localization and heading estimation. At close range (within 4 m), QR-code texture details became reliably resolvable, enabling the system to transition smoothly from LED-point guidance to QR-based accurate pose estimation and thus to achieve high-precision visual guidance in the terminal docking phase. The observed distance-dependent characteristics, with 5 m serving as the boundary between far and near ranges, were consistent with the intended system design. These results not only confirmed that the simulation-based image acquisition pipeline reproduced key distance-varying effects of underwater optical imaging but also delineated the effective operating intervals of the LED-dominant far-range guidance stage and the QR-texture-dominant near-range fine positioning stage, providing a basis for marker-parameter configuration.

3.3.3. Visual Docking Location Experiment in Diverse Simulated Environments

To quantitatively evaluate the vision-based docking localization performance of an autonomous underwater vehicle (AUV) under varying environmental conditions, and to validate the effectiveness and realism of the developed simulation platform at the visual-perception level, four comparative experiments were conducted within a unified simulation framework. Exploiting the platform’s parameterized configuration of illumination, water turbidity, occlusion, and disturbance flow velocity, the experiments examined the effects of four key factors—turbidity, depth-induced light attenuation, current velocity, and the number of randomly deployed obstacles—on the visual docking localization process. In each experiment, a controlled-variable design was adopted: while the other three environmental variables were held constant, only the target factor was varied and its disturbance intensity was increased stepwise, enabling a systematic assessment of the robustness of the monocular visual docking localization algorithm under representative conditions involving diverse illumination, multiple turbidity levels, severe occlusion, and complex flow fields. The AUV was required to complete a continuous monocular visual localization task from the long-range approach to close-range docking for different visibility ranges and imaging conditions. To better approximate the progressive degradation observed in real seas (“clear–moderately degraded–severely degraded”), sampling and evaluation were continuously performed starting from the moment the AUV entered the effective docking range, allowing the impacts of visual degradation on recognition performance and localization accuracy to be quantified.

In the turbidity impact experiments, the flow velocity was set to 0, the water depth to 1 m, and the number of randomly distributed small obstacles to 0. Turbidity was then increased stepwise (clear water, slightly turbid, and moderately turbid). It was observed that both the absolute and relative errors of vision-based localization increased markedly with turbidity. In Figure 18, a representative docking trial is reported, where the horizontal axis denotes the ground-truth distance between the AUV and the docking target and the vertical axis denotes the corresponding localization error. Although both error curves converged as the AUV approached the target, higher turbidity substantially slowed the convergence and resulted in a larger steady-state error. Furthermore, statistics over 100 consecutive docking trials with random initial positions indicated that the recognition success rate decreased significantly as turbidity increased, as presented in Table 6. In particular, when the marker was changed from a high-contrast QR code to a low-SNR LED point light source, a step-like increase in localization error was observed, further validating the detrimental effect of turbidity on image feature extraction; this finding is consistent with practical underwater optical observation characteristics.

In the depth (illumination attenuation) experiments, the flow velocity was set to 0 m/s, the turbidity was set to slightly turbid, and no random small obstacles were introduced. The depth was increased stepwise (0 m, 1 m, and 5 m) to simulate illumination attenuation with increasing depth. The results showed that localization error increased with depth, whereas the recognition success rate decreased accordingly. This trend was consistent with that observed in the turbidity experiments, indicating that illumination conditions directly affect visual localization performance. Figure 19 presents the time history of localization error during a single docking process; at greater depths, the error exhibited stronger oscillations and required a longer time to converge. Statistics from 100 trials further revealed that increasing depth reduced the system’s tolerance to initial position deviations, and visual localization failed in some cases when the error exceeded the recognition range, demonstrating that depth-induced illumination attenuation degrades localization accuracy, as shown in Table 7.

In the flow-velocity disturbance experiment, a test condition with a depth of 1 m, slight turbidity, and zero randomly placed small obstacles was used, and the lateral current velocity was increased stepwise (0, 0.5, and 1.0 m/s). The results showed that the visual localization error was only weakly affected by changes in flow velocity, indicating that image acquisition and feature extraction were largely insensitive to the current speed. In contrast, the recognition success rate decreased significantly as the flow velocity increased. As shown in Figure 20, the lateral-velocity error in a single trial exhibited increasingly pronounced temporal fluctuations, suggesting that the current disturbance primarily affected the AUV dynamics by inducing heading oscillations and trajectory instability, thereby elevating the likelihood of visual recognition failure, as shown in Table 8. These findings indicate that visual localization algorithms should be further optimized for dynamic flow environments to suppress instability induced by hydrodynamic disturbances.

In the obstacle-occlusion experiment, the water depth was set to 1 m, the turbidity condition was set to slightly turbid, and the flow velocity was set to 0 m/s. Within a 600 m³ simulated water volume, the number of small-volume obstacles was progressively increased (600, 6000, and 60,000). As shown in Figure 21, for the successfully recognized cases, both the relative and absolute localization errors varied only marginally; however, as the obstacle count increased, marker failures emerged within specific distance intervals. Compared with QR-code markers, point-light-source markers maintained a longer effective recognition distance under severe occlusion conditions. This difference was attributed to the small physical size of QR codes, which made them highly sensitive to local occlusion and prone to decoding failure even at short range under partial blockage conditions, whereas point-light-source markers, with larger inter-feature spacing, preserved identifiable global geometry despite partial central occlusion, as shown in Table 9. Overall, increasing the number of small-volume obstacles substantially reduced recognition success and narrowed the usable ranging range, indicating that future vision algorithms should further enhance occlusion-aware prediction and completion capabilities to achieve robust recognition across different marker types.

4. Conclusions

An AUV simulation platform was developed for underwater vision-based docking localization. The entire localization workflow was systematically modeled and simulated by accounting for key factors such as visual marker diversity, underwater optical imaging characteristics, and current-induced disturbances and occlusions. The platform incorporated parameterized turbidity and illumination models, configurable current and occlusion scenarios, and flexible configurations of multiple marker types, enabling controllable reproduction and broad coverage of complex underwater visual conditions, as well as standardized integration and benchmarking of docking localization algorithms. The results showed that the platform can effectively differentiate algorithm performance under varying illumination and turbidity, current disturbance, and severe occlusion conditions, providing a reliable and efficient environment for comparative evaluation and iterative refinement. In addition, it helps reduce the risk and cost of real-world underwater docking localization experiments and supports engineering-oriented deployment of related methods.

Although a simulation-based verification environment for underwater visual docking was developed in this work, with targeted extensions to optical modeling, fiducial-marker design, scene dynamics, and camera configuration, several limitations remain and should be addressed in future research.

1.: The current platform was primarily designed for visual localization, and the hydrodynamic and control components were modeled in a simplified manner; complex fluid-dynamic effects and high-fidelity realizations of control laws were not fully represented, which limited its utility for controller design and performance assessment. More complete hydrodynamic and control models should therefore be incorporated to enable closed-loop co-simulation and joint optimization of visual perception and control strategies.
2.: The optical and camera models were largely based on parametric approximations and may not adequately capture the highly variable underwater light field and the distribution of suspended particles observed in real marine environments, leaving a potential simulation-to-reality gap in rendered imagery. This gap could be reduced by refining light-transport and scattering models using measured data, while also considering multimodal sensor configurations and marker designs to improve robustness under challenging conditions.
3.: The present environment included a relatively limited set of scene elements and target types, focusing mainly on dynamic occlusion and intermittent loss of markers and obstacles, and did not sufficiently cover diverse underwater structures, ecological factors, or long-term environmental variations. Future work should enrich both static and dynamic assets and incorporate environment-evolution mechanisms to build a higher-fidelity simulation testbed applicable to a broader range of underwater tasks.

Author Contributions

R.X.: Methodology, Conceptualization, Software, Data analysis, Writing—original draft preparation. L.Z.: Supervision, Writing—review and editing, Formal analysis, Project administration, Funding acquisition. G.H.: Data analysis, Writing—review and editing L.L.: Methodology, Data analysis. All authors have read and agreed to the published version of the manuscript.

Funding

Local Science and Technology Special foundation under the Guidance of the Central Government of Shenzhen (JCYJ20210324122406019). Smart Eye Action Fund (Grant Number: 62602010321). National Key Research and Development Program of China (Grant Number: 2022YFC2805200).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AUV	Autonomous Underwater Vehicle
AUG	Autonomous Underwater Glider
DASP	Distributed Algorithm Simulation Platform
UE	Unreal Engine
PNP	Perspective-N-Point
QR	Quick-Response
TCP	Transmission Control Protocol
IP	Internet Protocol

References

Zhou, J.; Si, Y.; Chen, Y. A review of subsea AUV technology. J. Mar. Sci. Eng. 2023, 11, 1119. [Google Scholar] [CrossRef]
Lin, Y.-H.; Chiang, C.-H.; Yu, C.-M.; Huang, J.Y.-T. Intelligent docking control of autonomous underwater vehicles using deep reinforcement learning and a digital twin system. Expert Syst. Appl. 2026, 296, 129085. [Google Scholar] [CrossRef]
Liu, J.; Yu, F.; He, B.; Soares, C.G. A review of underwater docking and charging technology for autonomous vehicles. Ocean Eng. 2024, 297, 117154. [Google Scholar] [CrossRef]
Sun, K.; Han, Z. Autonomous underwater vehicle docking system for energy and data transmission in cabled ocean observatory networks. Front. Energy Res. 2022, 10, 960278. [Google Scholar] [CrossRef]
Chen, C.-W.; Kouh, J.-S.; Tsai, J.-F. Modeling and simulation of an AUV simulator with guidance system. IEEE J. Ocean. Eng. 2013, 38, 211–225. [Google Scholar] [CrossRef]
Manhães, M.M.M.; Scherer, S.A.; Voss, M.; Douat, L.R.; Rauschenbach, T. UUV simulator: A gazebo-based package for underwater intervention and multi-robot simulation. In Oceans 2016 MTS/IEEE Monterey; IEEE: Monterey, CA, USA, 2016; pp. 1–8. [Google Scholar]
Zhang, M.M.; Choi, W.-S.; Herman, J.; Davis, D.; Vogt, C.; McCarrin, M.; Vijay, Y.; Dutia, D.; Lew, W.; Peters, S.; et al. Dave aquatic virtual environment: Toward a general underwater robotics simulator. In 2022 IEEE/OES Autonomous Underwater Vehicles Symposium (AUV); IEEE: Singapore, 2022; pp. 1–8. [Google Scholar]
Farley, A.; Wang, J.; Marshall, J.A. How to pick a mobile robot simulator: A quantitative comparison of CoppeliaSim, Gazebo, MORSE and Webots with a focus on accuracy of motion. Simul. Model. Pract. Theory 2022, 120, 102629. [Google Scholar] [CrossRef]
Wen, J.; Yang, J.; Li, Y.; He, J.; Li, Z.; Song, H. Behavior-based formation control digital twin for multi-AUG in edge computing. IEEE Trans. Netw. Sci. Eng. 2022, 10, 2791–2801. [Google Scholar] [CrossRef]
Potokar, E.; Ashford, S.; Kaess, M.; Mangelson, J.G. Holoocean: An underwater robotics simulator. In 2022 International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2022; pp. 3040–3046. [Google Scholar]
Deng, J.; Filisetti, A.; Lim, H.S.; Kim, D.Y.; Al-Hourani, A. Simulating sensor noise model for real-time testing in a virtual underwater environment. In 2024 IEEE Annual Congress on Artificial Intelligence of Things (AIoT); IEEE: New York, NY, USA, 2024; pp. 226–231. [Google Scholar]
Wang, Q.; Wang, Z.; Yao, T. Research on Co-simulation of Underwater Robot Based on Webots and Matlab. In 2023 2nd International Conference on Automation, Robotics and Computer Engineering (ICARCE); IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
Michel, O. Cyberbotics Ltd. webots™: Professional mobile robot simulation. Int. J. Adv. Robot. Syst. 2004, 1, 5. [Google Scholar] [CrossRef]
Li, H.; Huang, F.; Chen, Z. Virtual-reality-based online simulator design with a virtual simulation system for the docking of unmanned underwater vehicle. Ocean Eng. 2022, 266, 112780. [Google Scholar] [CrossRef]
Zhang, S.; Qian, S.; Wang, L.; Fei, X.; Chen, Y. VIP-Dock: Vision, Inertia, and Pressure Sensor Fusion for Underwater Docking with Optical Beacon Guidance. In 2025 IEEE International Conference on Robotics and Automation (ICRA); IEEE: New York, NY, USA, 2025; pp. 7475–7481. [Google Scholar]
Jena, S.; Maity, S.R. Dock detection for an underwater autonomous vehicle using deep learning in a simulated environment. In Recent Advancements in Mechanical Engineering: Select Proceedings of ICRAME 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 403–411. [Google Scholar]
Katara, P.; Khanna, M.; Nagar, H.; Panaiyappan, A. Open source simulator for unmanned underwater vehicles using ROS and Unity3D. In 2019 IEEE Underwater Technology (UT); IEEE: New York, NY, USA, 2019; pp. 1–7. [Google Scholar]
Jebari, O.; Hong, S.H.; Hunsucker, T.; Kim, D.K.; Jin, C. Wave-field reconstruction using stereo cameras on a floating platform in a synthetic environment. Ocean Eng. 2025, 327, 120958. [Google Scholar] [CrossRef]
Lin, M.; Lin, R.; Li, D.; Yang, C. Light beacon-aided AUV electromagnetic localization for landing on a planar docking station. IEEE J. Ocean. Eng. 2023, 48, 677–688. [Google Scholar] [CrossRef]
Li, Y.; Sun, K.; Han, Z.; Lang, J. Deep Learning-Based Docking Scheme for Autonomous Underwater Vehicles with an Omnidirectional Rotating Optical Beacon. Drones 2024, 8, 697. [Google Scholar] [CrossRef]
Zhang, Z.; Zhong, L.; Lin, M.; Lin, R.; Li, D. Triangle codes and tracer lights based absolute positioning method for terminal visual docking of autonomous underwater vehicles. Ind. Robot. Int. J. Robot. Res. Appl. 2024, 51, 269–286. [Google Scholar] [CrossRef]
Xu, Z.; Haroutunian, M.; Murphy, A.J.; Neasham, J.; Norman, R. An underwater visual navigation method based on multiple aruco markers. J. Mar. Sci. Eng. 2021, 9, 1432. [Google Scholar] [CrossRef]
Amjad, R.T.; Mane, M.; Amjad, A.A.; Ge, W.; Zhang, Z.; Xu, J. Tracking of light beacons in highly turbid water and application to underwater docking. In Ocean Sensing and Monitoring XIV; SPIE: Bellingham, WA, USA, 2022; Volume 12118, pp. 90–97. [Google Scholar]
Pebrianti, D.; Suhaimi, M.S.; Bayuaji, L.; Hossain, M.J. Exploring Micro Aerial Vehicle Mechanism and Controller Design Using Webots Simulation. Mekatronika J. Intell. Manuf. Mechatron. 2023, 5, 41–49. [Google Scholar] [CrossRef]
Jaffre, F.; Littlefield, R.; Grund, M.; Purcell, M. Development of a new version of the remus 6000 autonomous underwater vehicle. In OCEANS 2019-Marseille; IEEE: Marseille, France, 2019; pp. 1–7. [Google Scholar]
Duan, Y.; Guan, X.; Liu, Y.; Yang, S.; Xiang, X.; Chen, H. Dynamic Docking Anti-Disturbance Control of Overactuated AUV: System, Method, and Lake Trails. J. Field Robot. 2025, 42, 1617–1632. [Google Scholar] [CrossRef]
Diao, J.; Gao, Z.; Yuan, X. Collision dynamics in AUV docking with conical hood type dock: Influencing factors and performance analysis. Ships Offshore Struct. 2024, 19, 2130–2136. [Google Scholar] [CrossRef]
Zhang, W.; Wu, W.; Teng, Y.; Li, Z.; Yan, Z. An underwater docking system based on UUV and recovery mother ship: Design and experiment. Ocean Eng. 2023, 281, 114767. [Google Scholar] [CrossRef]
Paull, L.; Saeedi, S.; Seto, M.; Li, H. AUV navigation and localization: A review. IEEE J. Ocean. Eng. 2013, 39, 131–149. [Google Scholar] [CrossRef]
Fan, S.; Liu, C.; Li, B.; Xu, Y.; Xu, W. AUV docking based on USBL navigation and vision guidance. J. Mar. Sci. Technol. 2019, 24, 673–685. [Google Scholar] [CrossRef]
Figueiredo, A.B.; Matos, A.C. MViDO: A high performance monocular vision-based system for docking A hovering AUV. Appl. Sci. 2020, 10, 2991. [Google Scholar] [CrossRef]
Xing, R.; Zhang, L.; Han, G.; Liu, L.; Chen, Z. Reconfigurable Line light sources Sensing Method for Vision-Based AUV Guidance. IEEE Sens. J. 2025, 25, 25075–25087. [Google Scholar] [CrossRef]
Ni, T.; Sima, C.; Li, S.; Zhang, L.; Wu, H.; Guo, J. Optimization of Trajectory Generation and Tracking Control Method for Autonomous Underwater Docking. J. Mar. Sci. Eng. 2024, 12, 1349. [Google Scholar] [CrossRef]
Li, Y.; Sun, K. Review of underwater visual navigation and docking: Advances and challenges. In Sixth Conference on Frontiers in Optical Imaging and Technology: Imaging Detection and Target Recognition; SPIE: Bellingham, WA, USA, 2024; Volume 13156, pp. 314–321. [Google Scholar]
Xing, R.; Zhang, L.; Liu, L.; Han, G.; Yang, C.; Yuan, C. Research on underwater visual docking control method for dual-AUVs. In 2025 10th International Conference on Control and Robotics Engineering (ICCRE); IEEE: New York, NY, USA, 2025; pp. 190–195. [Google Scholar]
Li, Y.; Jiang, Y.; Cao, J.; Wang, B.; Li, Y. AUV docking experiments based on vision positioning using two cameras. Ocean Eng. 2015, 110, 163–173. [Google Scholar] [CrossRef]
Zhao, Z.; Wang, B.; Liu, Z.; Wang, L.; Li, Y.; Li, Y. AUV underwater visual guided docking based on deep learning localization algorithm. Ocean Eng. 2026, 351, 124337. [Google Scholar] [CrossRef]
Xing, R.; Zhang, L.; Huang, B.; Han, G.; Liu, L. A spatio-temporal attention enhanced CNN method for marker localization in AUV docking. Front. Mar. Sci. 2025, 13, 1774551. [Google Scholar] [CrossRef]
Zhu, W.; Sun, K.; Li, Y. Multi-dimensional guidance system with adaptive algorithm and lightweight model for AUV underwater optical docking. Drones 2025, 9, 861. [Google Scholar] [CrossRef]

Figure 1. Design of the Autonomous Underwater Vehicle(AUV) visual docking simulation.

Figure 2. AUV physical model and Webots simulation.

Figure 3. Point light source markers in different sizes.

Figure 4. QR code markers in different sizes.

Figure 5. Underwater turbidity simulation,

β = 0.2

.

Figure 5. Underwater turbidity simulation,

β = 0.2

.

Figure 6. Simulation of natural light at different underwater depths.

Figure 7. Visual docking positioning coordinate system.

Figure 8. Visual docking positioning process.

Figure 9. AUV motion.

Figure 10. Camera calibration board and simulation model.

Figure 11. Point light source markers for different underwater environments.

Figure 12. QR-code markers for different underwater environments.

Figure 13. Diagram of obstacles blocking visual markers.

Figure 16. Simulated visual docking location images.

Figure 17. Real visual docking location images.

Figure 18. The turbidity impact experiment visual location error.

Figure 19. The depth impact experiment visual location error.

Figure 20. The flow velocity impact experiment visual location error.

Figure 21. The occlusion impact experiment visual location error.

Table 1. AUV carrier simulation parameter mapping.

Component	Webots Model	Quantity	Range	Resolution
Depth sensor	Altimeter	Depth	0–50 m	0.01 m
Attitude sensor	InertialUnit	Heading	0–360°	0.1°
		Pitch	−90°–90°	0.1°
		Roll	−90°–90°	0.1°
Thruster	Propeller	Rotational speed	−100–100 rad/s	0.01 rad/s

Table 2. Camera payload simulation parameter mapping.

Camera Property	Webots Property	Range	Resolution
Horizontal field of view	Horizontal FieldOfView	0–90°	0.1°
Vertical field of view	Vertical FieldOfView	0–75°	0.1°
Image resolution	width × height	up to 1280 × 720	1 pixel
Gaussian noise	noise	0–256	0.1

Table 3. Underwater fluid properties.

Property	Effect Force	Freshwater Environment	Seawater Environment
Density	Buoyancy	998.2 kg/m³	1025 kg/m³
Viscosity	Drag force	1.002 × 10⁻³ Pa·s	1.3 × 10⁻³ Pa·s
Flow velocity	Disturbance force	0–0.5 m/s	0.5–2 m/s

Table 4. Camera intrinsic parameter calibration error.

Resolution	Focal fx	Focal fy	Center cx	Center fy
1920 × 1080	0.36%	0.32%	0.44%	0.56%
1280 × 720	0.42%	0.38%	0.51%	0.62%
640 × 480	0.50%	0.46%	0.58%	0.74%

Table 5. AUV dynamic positioning thruster output values.

Group	Density/Velocity	Main/Lateral/Vertical
benchmark	$1025 kg / m^{3}$ ; $[0, 0, 0] m / s$	[0,0,0,0,6,6]
increased density	$1075 kg / m^{3}$ ; $[0, 0, 0] m / s$	[0,0,0,0,4,4]
decreased density	$975 kg / m^{3}$ ; $[0, 0, 0] m / s$	[0,0,0,0,8,8]
vertical flow upward	$1025 kg / m^{3}$ ; $[0, 0, - 0.5] m / s$	[0,0,0,0,5,5]
vertical flow down	$1025 kg / m^{3}$ ; $[0, 0, 0.5] m / s$	[0,0,0,0,7,7]
flow forward	$1025 kg / m^{3}$ ; $[0.5, 0, 0] m / s$	[−2,−2,0,0,0,0]
flow backward	$1025 kg / m^{3}$ ; $[- 0.5, 0, 0] m / s$	[2,2,0,0,0,0]
lateral flow left	$1025 kg / m^{3}$ ; $[0, 0.5, 0] m / s$	[0,0,−2,−2,0,0]
lateral flow right	$1025 kg / m^{3}$ ; $[0, - 0.5, 0] m / s$	[0,0,2,2,0,0]

Table 6. The turbidity impact experiment results.

Turbidity	Average Positioning Error	Successful Docking Location Rate
without	0.93%	98%
mild	1.16%	90%
moderate	1.35%	83%

Table 7. The depth impact experiment results.

Depth	Average Positioning Error	Successful Docking Location Rate
0 m	1.35%	96%
1 m	1.48%	92%
5 m	1.75%	87%

Table 8. The flow velocity impact experiment results.

Lateral Velocity	Average Positioning Error	Successful Docking Location Rate
0 m/s	1.34%	97%
0.5 m/s	1.36%	92%
1.0 m/s	1.40%	86%

Table 9. The occlusion impact experiment results.

Occlusion Numbers	Average Positioning Error	Successful Docking Location Rate
600	1.37%	97.2%
6000	1.38%	79.8%
60,000	1.37%	19.6%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xing, R.; Zhang, L.; Han, G.; Liu, L. Design of an AUV Visual Docking Localization Simulation Platform Based on Webots. J. Mar. Sci. Eng. 2026, 14, 374. https://doi.org/10.3390/jmse14040374

AMA Style

Xing R, Zhang L, Han G, Liu L. Design of an AUV Visual Docking Localization Simulation Platform Based on Webots. Journal of Marine Science and Engineering. 2026; 14(4):374. https://doi.org/10.3390/jmse14040374

Chicago/Turabian Style

Xing, Runfa, Lichuan Zhang, Guangyao Han, and Lu Liu. 2026. "Design of an AUV Visual Docking Localization Simulation Platform Based on Webots" Journal of Marine Science and Engineering 14, no. 4: 374. https://doi.org/10.3390/jmse14040374

APA Style

Xing, R., Zhang, L., Han, G., & Liu, L. (2026). Design of an AUV Visual Docking Localization Simulation Platform Based on Webots. Journal of Marine Science and Engineering, 14(4), 374. https://doi.org/10.3390/jmse14040374

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of an AUV Visual Docking Localization Simulation Platform Based on Webots

Abstract

1. Introduction

2. Methodology

2.1. Underwater Visual System Simulation

2.1.1. AUV

2.1.2. Camera and Marker

2.2. Underwater Visual Environment Simulation

2.2.1. Turbidity and Attenuation

2.2.2. Perturbation

2.2.3. Occlusion

2.3. Visual Docking Location Simulation of AUV

2.3.1. Visual Docking Location Task

2.3.2. Visual Docking Location Evaluation Method

3. Results and Discussion

3.1. Simulation Validation of AUV

3.1.1. Validation of AUV Maneuvering Capability

3.1.2. Validation of Camera Carried by AUV

3.2. Simulation Validation of Environment

3.2.1. Effects of Turbidity and Attenuation on Visual Marker Recognition

3.2.2. AUV Disturbance Rejection Capability Validation

3.2.3. Effects of Obstacle Occlusion on Visual Marker Detection Performance

3.3. AUV Visual Docking Location Simulation Experiment

3.3.1. Visual Docking Location Experiment Workflow

3.3.2. Emulating Practical AUV Docking Visual Sequences

3.3.3. Visual Docking Location Experiment in Diverse Simulated Environments

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI