1. Introduction
Autonomous Underwater Vehicles (AUVs) are indispensable platforms for ocean exploration, environmental monitoring, and military operations, and their autonomy and reliability directly influence mission execution efficiency [
1]. Autonomous vision-based docking between an AUV and a docking station is a key enabling technology for underwater energy replenishment, data recovery, and long-term resident observation [
2,
3]. However, conducting docking experiments directly in real ocean environments remains challenging [
4,
5]. Sea trials are expensive and typically depend on large support platforms and prolonged offshore operations. At the same time, underwater visual perception is substantially degraded by distance-dependent light attenuation, turbidity-induced scattering and contrast loss, and color distortion, which collectively reduce the robustness and stability of marker detection and localization (e.g., beacon lights or structured patterns) in complex waters. Moreover, it is difficult to precisely reproduce and systematically control diverse combinations of turbidity, illumination, and occlusion conditions in physical experiments, limiting comprehensive and repeatable validation as well as robustness evaluation of vision-based docking localization algorithms. Therefore, a low-cost, controllable, and repeatable simulation-based validation environment is urgently needed to systematically evaluate and compare underwater visual docking localization methods and to support algorithm design and iterative optimization.
In recent years, simulation technologies for underwater robots have made notable progress, primarily focusing on motion control, path planning, and sensor modeling. Platforms such as Gazebo, Unity, and Unreal Engine (UE) provide fundamental support for hydrodynamics and multi-sensor integration and have become mainstream choices for simulating ground and aerial robots [
6,
7,
8]. Wen et al. [
9] implemented physics-engine simulation of an autonomous underwater glider (Autonomous Underwater Glider, AUG) on the Gazebo platform and realized feedback of the physical model in digital space via WebGL-based 3D visualization; by incorporating an improved artificial potential field method and edge-computing techniques, they achieved formation path-optimization control for multiple AUGs. Wang et al. [
10] proposed an underwater robot dynamic simulation platform based on UE, AirSim, and a Distributed Simulation Algorithm Platform (DASP), which provides a safe, efficient, and reproducible virtual test environment for robot control algorithms through water-body modeling, lighting and shading rendering, and physical interaction; however, scene construction relies heavily on manual parameter tuning and substantial computational resources, and models of hydrodynamics and communication still require refinement, such that they cannot yet fully replace real-water testing. Deng et al. [
11] developed an underwater robot simulation framework using MATLAB/Simulink 2021B and UE4, enabling rapid deployment and validation of sensor noise models in a highly realistic virtual underwater environment; through noise-injection experiments on stereo cameras and radar, they verified its effectiveness in improving simulation fidelity, but the framework currently supports only Gaussian noise, and its extensibility to additional sensor types and noise models remains limited. Wang et al. [
12] presented a co-simulation approach based on Webots [
13] and MATLAB/Simulink for modeling and algorithm verification of underwater robot control systems: Webots was used to construct the physical model, Simulink was used to implement controllers for velocity and attitude tracking, and real-time data exchange was achieved via APIs to improve simulation credibility and development efficiency; nevertheless, more complex hydrodynamic or noise models have not yet been incorporated.
Existing studies have largely focused on AUV navigation control or sonar sensing, whereas simulation platforms specifically designed for vision-guided docking and covering the full “perception–localization–docking” pipeline remain scarce. Li et al. [
14] developed a virtual-reality-based online simulation system for UUV underwater docking to visualize the docking process and validate algorithms. The system integrated four navigation modalities with fuzzy PID control, offering real-time, interactive, and immersive operation; however, its complex architecture imposed strong requirements on communication synchronization and hardware, and its extensibility still needs improvement. Zhang et al. [
15] fused visual, inertial, and pressure-sensor information and constructed a virtual docking simulation system by integrating vehicle dynamics, control algorithms, perception modules, and environmental disturbances; robust optically guided docking was achieved using an unscented Kalman filter. Nevertheless, the framework was limited to single-beacon scenarios and has not yet been comprehensively validated in real waters to assess its generalization capability and long-term stability. Jena et al. [
16] built an AUV underwater docking simulation platform based on Unity3D 2020 and MATLAB 2020B, incorporating YOLOv3 for real-time detection of the optical docking target and vision-based guidance; the platform supported six-DOF motion simulation, image acquisition and processing, and TCP/IP communication with MATLAB. However, the simulation environment was often idealized, particularly in the modeling of critical underwater visual factors such as light attenuation, turbidity variations, and target occlusion, which hinders faithful reproduction of complex underwater optical scenes and thereby limits systematic robustness evaluation of vision-based docking localization algorithms.
In summary, existing vision-based docking simulations still suffer from pronounced limitations. On the one hand, underwater optical effects—such as depth-dependent illumination attenuation and turbidity—are often modeled in an overly coarse manner, making it difficult to faithfully capture image degradation and its impact on target recognition under complex water conditions. On the other hand, the morphology and discriminative features of visual docking markers are not simulated with sufficient fidelity, which limits compatibility with diverse vision algorithms and reduces the diversity of test scenarios and performance evaluations [
17]. Moreover, although Webots has been widely used as a general-purpose robotic simulation platform that integrates a physics engine and multiple sensors in terrestrial and aerial domains, its capability for underwater visual-scene modeling tailored to AUV vision-based docking remains limited, and a dedicated system-level solution is still lacking. Overall, an integrated simulation of challenging underwater visual conditions—particularly multi-illumination, multi-turbidity, and severe occlusion—remains largely absent, highlighting an urgent need for more realistic simulation tools with finer-grained modeling of vision-related factors [
18].
Against this backdrop, for the design and evaluation of vision-based localization methods, developing an AUV docking simulation platform with underwater visual effects at its core has become a key link between algorithm development and sea-trial validation. By integrating (i) underwater illumination and depth-attenuation modeling, (ii) a tunable turbidity-induced imaging degradation model, (iii) mechanisms for generating multiple types of visual markers and complex occlusion scenarios, and (iv) approximate kinematic modeling into Webots, systematic and quantitative simulation-based testing and comparative analysis of vision-based docking localization algorithms can be conducted in low-cost, parameter-controllable, and repeatable settings. This, in turn, can substantially reduce the risks and resource consumption associated with sea trials and accelerate iterative optimization and engineering deployment. The contributions of this paper are as follows:
- 1.
An underwater docking fiducial system was developed to support multiple classes of vision algorithms. By exploiting hierarchical point–line–plane features, a closed-loop modeling pipeline was established that spans fiducial design, visual detection, and pose estimation, providing a unified visual measurement interface for end-to-end vision-in-the-loop AUV docking simulation.
- 2.
A parametric underwater optics-and-occlusion model was established by extending the Webots fog model with depth-dependent attenuation, adjustable turbidity, and randomized occlusion, enabling controllable simulation of representative underwater illumination and turbidity conditions.
- 3.
An integrated docking-process simulation that accounts for both AUV kinematics and camera field-of-view constraints was implemented, along with a batch benchmarking framework for quantitative comparison of localization accuracy, docking success rate, and robustness across multiple trajectories and operating conditions.
The remainder of this paper is organized as follows.
Section 2 describes the overall methodology and implementation of a simulation platform for underwater vision-based docking, including AUV motion and camera modeling, a multi-type visual marker suite, underwater illumination and attenuation models, and a scenario generation mechanism that accounts for occlusions and current-induced disturbances.
Section 3 presents the platform’s operation and evaluation; visual localization algorithms were quantitatively tested under representative conditions with varying illumination, turbidity, and occlusion, which confirmed the platform’s functional completeness and the accuracy of its results.
Section 4 concludes the paper and discusses how the platform supports the design, comparative evaluation, and iterative optimization of underwater docking localization algorithms, as well as directions for future extension.
3. Results and Discussion
Progressive validation of an underwater visual docking localization simulation platform was performed across three levels: AUV motion, environment/landmarks, and task-level docking. At the AUV level, the simulated camera intrinsics were calibrated and benchmarked against those of the physical underwater camera to ensure consistency in imaging geometry, and an approximate motion model was used to control AUV attitude and translation so that the resulting trajectories matched typical docking maneuvers. At the environment and landmark level, the imaging behaviors of representative markers (e.g., point light sources and QR codes) were analyzed under varying turbidity, light attenuation, and occlusion conditions, confirming that key underwater-vision phenomena—signal attenuation, contrast degradation, and partial/complete target occlusion—were reproduced. On this basis, a classical vision-based localization method was used to systematically test the full visual docking process under four representative operating conditions (turbidity variation, attenuation variation, disturbance-speed variation, and occlusion variation), and the condition-dependent trends of localization accuracy and target recognition success were compared between simulation and real underwater docking scenarios. The results demonstrated that the performance variation laws observed in simulation were highly consistent with empirical trends from practical underwater visual docking, thereby providing strong behavioral evidence for the realism and credibility of the proposed simulation environment.
3.1. Simulation Validation of AUV
3.1.1. Validation of AUV Maneuvering Capability
In underwater vision-based docking and localization, the fidelity with which the simulated AUV platform reproduces the camera motion encountered in real operations directly determines the credibility of subsequent algorithmic evaluations. During docking, the camera’s spatial trajectory, body attitude, and viewing-angle variations correspond, respectively, to the camera position, motion speed, and observation pose in the localization problem. Accordingly, emphasis is placed not on how the motion is generated, but on whether the simulation carrier can reliably realize the motion outcomes required for docking scenarios along three essential dimensions: vertical displacement capability through depth variation, along-track progression capability through speed variation, and viewpoint/line-of-sight adjustment capability through heading variation. By covering these motion outcomes, the platform can reproduce representative sets of camera trajectories and pose changes that are typical of docking operations, thereby providing a motion foundation for systematic evaluation of vision-based docking localization algorithms under diverse trajectories and viewing conditions.
On this basis, the motion performance of the platform was examined in Webots. Ground-truth pose sequences were obtained by directly reading and converting the position and attitude variables provided at the Webots system level, yielding accurate reference values for quantitative assessment of visual localization algorithms. The step-change test curves in
Figure 9 indicate that the platform exhibits clear and repeatable motion behavior in all three dimensions. For depth variation, the vehicle reaches the target depth rapidly and maintains a stable vertical position thereafter. For speed variation, velocity changes remain continuous and consistent across trials, supporting steady progression and displacement along the docking path. For heading variation, commanded heading changes are distinctly realized and subsequently held, ensuring that the camera viewing direction and observation angle evolve as intended. Compared with conventional simulations that provide only images and coarse trajectories, the present platform supplies high-precision, system-consistent ground-truth position and orientation data. The response curves further substantiate that the achievable depth, speed, and heading variations satisfy the operational requirements of the camera as a visual sensing device in docking, thereby enabling a quantifiable and repeatable benchmark for evaluating underwater vision-based docking localization algorithms.
3.1.2. Validation of Camera Carried by AUV
In AUV visual docking simulation, the credibility of the results hinges on whether the camera model reproduces geometric imaging behavior consistent with real engineering systems. Accordingly, the intrinsic-parameter configuration of the Webots camera is calibrated and validated to ensure agreement with a practical engineering camera in key parameters, including focal length, principal-point location, resolution, and field of view. This consistency supports engineering-level comparability in the distribution of feature points on the image plane and in the mapping between image measurements and spatial geometry. Because distortion in real underwater cameras is diverse, strongly lens-dependent, and typically compensated through calibration and undistortion in practice, complex distortion models are not explicitly constructed in the simulation. Instead, Webots-generated images are treated as outputs after distortion correction, and the modeling focuses on the general intrinsic geometry that most strongly affects visual-docking localization. This choice aligns with the engineering workflow of correcting distortion prior to localization and preserves the platform’s generality for comparative evaluation of different visual localization algorithms.
Building on the above, the classical Zhang calibration method was adopted to systematically calibrate the intrinsic parameters of the simulated camera and to analyze calibration errors, thereby validating the rationality of the proposed modeling assumptions. In Webots, a black–white chessboard calibration target was constructed in
Figure 10. Multiple images were acquired by varying the relative pose between the camera and the target, and the intrinsic parameters (fx, fy, cx, cy) were estimated using Zhang’s method. The intrinsic ground-truth values were predefined in simulation, and five independent calibration trials were conducted at each of three resolutions (1080p, 720p, and 480p); the mean relative error for each intrinsic parameter was then computed. As shown in
Table 4, the mean relative errors of all intrinsic parameters were below 1% across all resolutions. These results indicated that, within the Webots simulation environment, an engineering-consistent calibration workflow could stably recover high-accuracy camera intrinsics, and that the geometric imaging characteristics of the simulated camera were highly consistent with those of a real camera. This consistency ensured that subsequent simulation-based tests of the visual docking and localization algorithm were engineering-relevant and comparable in terms of feature-point acquisition and error levels.
3.2. Simulation Validation of Environment
3.2.1. Effects of Turbidity and Attenuation on Visual Marker Recognition
Because visual localization prioritized stable and repeatable target position and pose estimation rather than an exact reproduction of the microscopic physics of underwater light transport, the optical modeling in this study emphasized consistency in macroscopic degradation trends. Specifically, as turbidity and water depth varied across a wide range, the simulated changes in image brightness, contrast, and target distinguishability were expected to follow the same direction of change—and be of comparable magnitude—as those observed in real underwater environments. This objective matched the design intent of the proposed turbidity–depth attenuation model, which aimed to generate representative degradation samples spanning large parameter ranges for image preprocessing and robustness evaluation, rather than to achieve high-fidelity optical simulation under a single operating condition. Accordingly, visualization and recognition experiments were conducted to verify that, under different turbidity and depth settings, the simulated imaging degradation of visual markers obeyed the practical pattern that higher turbidity and longer optical paths lead to darker targets, blurrier contours, and more challenging recognition.
Building upon the above, a unified underwater scenario was constructed in Webots to validate the optical imaging and recognition performance of two representative visual markers: an LED point-light marker and a QR-code marker. The LED marker was configured with l = 0.5 m and an emitting radius of r = 0.08 m, while the QR-code marker adopted the configuration ID = 3 with a side length of 0.2 m. These settings represent point-feature and planar-feature targets, respectively. Marker parameters were held constant throughout the experiments, and environmental parameters were varied to systematically assess how optical degradation influences marker visibility and recognition difficulty.
For the environment, the turbidity parameter and the underwater light attenuation parameter k were discretized and combined—guided by typical ranges of water attenuation coefficients—to cover representative operating conditions from clear near-surface water to moderately turbid mid-depth water. characterizes the bulk scattering and absorption strength of the medium, with a theoretically feasible range of approximately 0–1.5 m−1; smaller indicates clearer water and larger indicates higher turbidity. To span conditions from (idealized) zero turbidity to moderate turbidity, was set to 0, 0.2, and 0.5, corresponding to a reference baseline, mildly turbid nearshore or slightly disturbed waters, and moderately turbid waters with substantially reduced visibility. In addition, k was introduced to model the exponential decay of light intensity with depth d (typical range ) and was set to 0, 0.3, and 0.7, representing a near-surface baseline with negligible additional attenuation (equivalent m), a moderate-attenuation condition approximating m (shallow-water or near-field operations), and a strong-attenuation condition approximating m to emulate dimmer mid-depth environments with reduced contrast.
For the point-light-source marker observed at 5 m, increasing turbidity (from clear to moderate) together with increasing water depth (0–5 m) resulted in a pronounced reduction in apparent brightness and a progressive loss of edge definition of the luminous spot; ultimately, only coarse localization was possible and a well-defined boundary could not be resolved, as illustrated in
Figure 11. For the QR-code marker observed at 1 m, a monotonic decrease in black–white contrast and gradual blurring of the internal pattern were observed as turbidity and depth increased; under moderate turbidity conditions at 5 m depth, the module boundaries became difficult to distinguish, leading to a substantial decline in recognition performance, as shown in
Figure 12. In summary, both point-type and planar markers exhibited the same degradation tendency under these conditions, with environmental deterioration causing brightness attenuation, reduced contrast, and increased recognition difficulty.
Taken together, the experimental results show that the proposed underwater optical-environment simulator reproduces—over a broad range of settings—the way turbidity and water depth degrade the imaging quality of both point and planar visual landmarks and that its overall trends match those observed in real underwater conditions. This supports the validity of the turbidity and light-attenuation modeling adopted here and suggests that the simulator can supply reliable, controllable degradation samples for designing image-preprocessing steps and evaluating robustness in underwater visual localization, without resorting to detailed physical-optics solvers.
3.2.2. AUV Disturbance Rejection Capability Validation
Variations in fluid properties were treated as a principal external disturbance affecting AUV attitude stability and the accuracy of vision-based docking. To emphasize the AUV’s motion-level disturbance-rejection capability and to verify that the simulation environment could faithfully capture this process, a station-keeping scenario was implemented: the AUV was constrained to maintain a prescribed pose in three-dimensional space, and compensatory thrust was actively produced by omnidirectional thrusters in response to changes in the ambient flow field or fluid properties. Because the camera was rigidly mounted on the vehicle, its spatial pose was entirely determined by the AUV’s overall motion; consequently, the AUV’s ability to maintain stable motion under disturbance conditions was directly manifested as camera-pose stability during the vision-based docking task.
In the simulation tests, flow disturbances of varying directions and intensities were imposed, and the fluid-property parameters were adjusted to examine the response characteristics of the vertical thrusters, lateral thrusters, and the main propulsor, as shown in
Table 5. The results showed that both vertical disturbances that induced surfacing/sinking tendencies and horizontal disturbances that drove fore–aft or left–right deviations were counteracted by compensatory thrust generated in the direction opposite to the disturbance. Furthermore, variations in disturbance intensity were accompanied by corresponding, physically reasonable changes in thruster output. This qualitative relationship was consistent with the motion response of an AUV subject to flow-field interference in real underwater environments, suggesting that the simulated AUV exhibited engineering-realistic disturbance-rejection capability and could maintain relatively stable platform attitude and camera pose under perturbation conditions. The observations also corroborated the fidelity and effectiveness of the disturbance modeling in the simulation environment, supporting its use for credible evaluation of camera motion and steady-state characteristics in vision-based docking localization tasks under flow disturbance conditions.
3.2.3. Effects of Obstacle Occlusion on Visual Marker Detection Performance
In the occlusion-impact experiments, within a simulated water region of volume
, the above Poisson-distributed occlusion model is used to progressively increase the number of small-volume obstacles. Three density levels are considered by setting
, 6000, and 60,000, and the imaging performance of point-light markers and QR-code markers is compared, as shown in
Figure 13.
As can be seen in
Figure 13, the results indicate that when the number of obstacles is 600, occlusion blobs in the field of view are relatively sparse and in most cases appear only in background regions, exerting limited influence on the contours and details of both point-light and QR-code markers; target edges and internal structures remain largely clear and discernible. When the obstacle number increases to 6000, visible suspended particles increase markedly; more frequent local occluding highlights appear around the point-light marker, and portions of QR-code modules are occluded, leading to an evident loss of overall texture completeness. When the obstacle number further increases to 60,000, the image exhibits a strongly occluded state dominated by high-density particles; the light-spot contour of the point-light marker is often interrupted by occlusion blobs, with pronounced discontinuities in brightness distribution and even local disappearance, while the QR pattern shows large-area block occlusions and structural fragmentation, with blurred or even barely recognizable boundaries between the original black-and-white modules. The comparison between the two marker types suggests that, due to its smaller size and highly concentrated information, the QR code is more sensitive to local occlusions, and its pattern integrity degrades rapidly with medium-to-high obstacle densities; by contrast, the point-light marker has a sparser distribution of feature points and can still retain a relatively clear overall geometric layout under moderate occlusion conditions. Overall, the progressive increase in obstacle number significantly alters the imaging clarity and structural integrity of the markers, providing a controllable imaging scenario for subsequent analysis of recognition performance and robustness differences among visual algorithms under multi-level occlusion conditions.
3.3. AUV Visual Docking Location Simulation Experiment
3.3.1. Visual Docking Location Experiment Workflow
To validate the applicability of the developed Webots-based AUV visual docking simulation platform to underwater visual localization modeling and evaluation, an underwater docking simulation experiment covering the complete visual localization pipeline was designed and implemented. The experiments were conducted in a pool-style simulated environment measuring 15 m (length) × 5 m (width) × 8 m (depth). The scene included a simulated AUV equipped with a monocular vision sensor and a docking target system composed of multiple types of visual markers [
36]. Key environmental factors—illumination, turbidity, and occlusion—were parameterized to generate operating conditions with different visibility ranges and imaging qualities, under which the AUV performed a continuous visual localization task from the long-range approach to close-range docking [
14]. Following a typical AUV visual docking workflow, the vision-dominated docking procedure was divided into four consecutive stages, as shown in
Figure 14, focusing on the availability of visual cues at each stage, the resulting pose-estimation accuracy, and how localization performance supported the process:
(1) Initial pose adjustment and docking initiation: Based on the predefined trajectory and initial navigation information, the AUV adjusted its attitude and heading toward the docking area, such that the target was about to enter the camera’s observable range and basic observability was established for subsequent visual takeover.
(2) Long-range coarse visual localization using LED point lights: Once the point lights appeared in the camera view with clearly discernible contours, the target’s approximate direction and coarse relative bearing were inferred through contour detection and analysis of their image-plane distribution, and the stand-off distance was reduced to a range suitable for recognizing near-field markers.
(3) Short-range accurate pose estimation using QR-code markers: When the QR-code markers became clearly visible, their geometric structures were leveraged to solve the monocular relative pose, progressively reducing the pose error between the AUV and the docking device to enable fine approach.
(4) Final docking and relative-distance convergence: The AUV approached the docking device at low speed, while the remaining translational and attitudinal errors were driven to converge, with attention restricted to the terminal process in which the relative pose and distance approached zero, in order to assess whether monocular visual localization could guarantee relative distance and attitude accuracy in the final docking stage.
In addition, to facilitate a comparison and validation of the simulation results, an ideal indoor experimental setup was constructed, and images of a physical QR code and an LED point-light landmark were acquired, as shown in
Figure 15. The real images were incorporated primarily to examine whether the simulated imaging characteristics of the QR code and the point light source agree with those observed in practice, thereby indicating that the simulated visual observation sequence can faithfully represent the imaging features encountered during actual AUV docking.
Figure 14.
AUV visual docking process.
Figure 14.
AUV visual docking process.
Figure 15.
The indoor visual docking process of an AUV.
Figure 15.
The indoor visual docking process of an AUV.
To ensure consistency between simulation and experiment, the same vision platform and camera configuration were adopted in the physical tests as in the simulation environment. In particular, the onboard camera was set to a resolution of 640 × 480 pixels, identical to the simulated camera model, which is crucial for the visual localization task so that the image scale, field of view, and pixel-level measurement characteristics remain consistent. Moreover, the physical QR code and point-light landmarks were fabricated with exactly the same geometric dimensions as those defined in the simulation. With this design, the comparison between simulated and real underwater experiments was performed under strictly matched conditions, enabling us to directly assess whether the camera images acquired in the real underwater environment exhibit visual features and geometric cues that are consistent with those generated in the simulation.
3.3.2. Emulating Practical AUV Docking Visual Sequences
To emulate the characteristics of the visual observation sequence in practical AUV docking, six representative docking poses were arranged along the docking axis at 2 m intervals, and the corresponding simulated image sequence was obtained, as shown in
Figure 16. In the simulation, the docking distance was defined as the Euclidean distance from the AUV camera optical center to the plane containing the docking-device markers [
37]. The geometric parameters of the visual markers were set as follows:
LED point-light marker length (
), emitting-region radius (
), and
QR-code marker-block side length (
).
Based on the above marker scale and the optical imaging conditions, two typical observation regimes (long-range and short-range) were investigated, with a docking distance of 5 m used as the boundary between them.
Figure 17 shows the visual docking acquisition results obtained with physical fiducial markers in an ideal indoor environment. Overall, the measured imaging characteristics of both the point light source and the QR code were in close agreement with the simulation results: the light-spot morphology and spatial distribution were well reproduced, and the QR code exhibited comparable edge definition and textural structure. A side-by-side comparison further indicated that, with the scale parameters and imaging settings held constant, the simulated “outdoor” docking visual scene produced imaging effects that were largely consistent with those observed in the ideal indoor condition [
38,
39]. In particular, the distance-dependent trends of the light-spot size and brightness, as well as the changes in QR-code recognizability and geometric deformation with viewing range, followed the same patterns in simulation and experiment. These results support the validity of the proposed simulation-based visual observation model for representing the imaging behavior of the point light source and QR code during AUV docking, and they underpin subsequent algorithm design and performance evaluation based on simulated data.
When the docking distance exceeded 10 m, the marker’s global outline was no longer clearly discernible because of underwater light attenuation and turbidity-induced scattering; only a faint, low-contrast target region remained visible, indicating that the system was operating near the recognition limit with the current marker scale and optical configuration. In the far-range interval of 6–10 m, QR-code texture information could not yet be resolved, whereas the LED point sources were comparatively salient against the background. The corresponding light spots exhibited sharp contours and a stable spatial arrangement, which provided the dominant visual cues for point-feature-based coarse localization and heading estimation. At close range (within 4 m), QR-code texture details became reliably resolvable, enabling the system to transition smoothly from LED-point guidance to QR-based accurate pose estimation and thus to achieve high-precision visual guidance in the terminal docking phase. The observed distance-dependent characteristics, with 5 m serving as the boundary between far and near ranges, were consistent with the intended system design. These results not only confirmed that the simulation-based image acquisition pipeline reproduced key distance-varying effects of underwater optical imaging but also delineated the effective operating intervals of the LED-dominant far-range guidance stage and the QR-texture-dominant near-range fine positioning stage, providing a basis for marker-parameter configuration.
3.3.3. Visual Docking Location Experiment in Diverse Simulated Environments
To quantitatively evaluate the vision-based docking localization performance of an autonomous underwater vehicle (AUV) under varying environmental conditions, and to validate the effectiveness and realism of the developed simulation platform at the visual-perception level, four comparative experiments were conducted within a unified simulation framework. Exploiting the platform’s parameterized configuration of illumination, water turbidity, occlusion, and disturbance flow velocity, the experiments examined the effects of four key factors—turbidity, depth-induced light attenuation, current velocity, and the number of randomly deployed obstacles—on the visual docking localization process. In each experiment, a controlled-variable design was adopted: while the other three environmental variables were held constant, only the target factor was varied and its disturbance intensity was increased stepwise, enabling a systematic assessment of the robustness of the monocular visual docking localization algorithm under representative conditions involving diverse illumination, multiple turbidity levels, severe occlusion, and complex flow fields. The AUV was required to complete a continuous monocular visual localization task from the long-range approach to close-range docking for different visibility ranges and imaging conditions. To better approximate the progressive degradation observed in real seas (“clear–moderately degraded–severely degraded”), sampling and evaluation were continuously performed starting from the moment the AUV entered the effective docking range, allowing the impacts of visual degradation on recognition performance and localization accuracy to be quantified.
In the turbidity impact experiments, the flow velocity was set to 0, the water depth to 1 m, and the number of randomly distributed small obstacles to 0. Turbidity was then increased stepwise (clear water, slightly turbid, and moderately turbid). It was observed that both the absolute and relative errors of vision-based localization increased markedly with turbidity. In
Figure 18, a representative docking trial is reported, where the horizontal axis denotes the ground-truth distance between the AUV and the docking target and the vertical axis denotes the corresponding localization error. Although both error curves converged as the AUV approached the target, higher turbidity substantially slowed the convergence and resulted in a larger steady-state error. Furthermore, statistics over 100 consecutive docking trials with random initial positions indicated that the recognition success rate decreased significantly as turbidity increased, as presented in
Table 6. In particular, when the marker was changed from a high-contrast QR code to a low-SNR LED point light source, a step-like increase in localization error was observed, further validating the detrimental effect of turbidity on image feature extraction; this finding is consistent with practical underwater optical observation characteristics.
In the depth (illumination attenuation) experiments, the flow velocity was set to 0 m/s, the turbidity was set to slightly turbid, and no random small obstacles were introduced. The depth was increased stepwise (0 m, 1 m, and 5 m) to simulate illumination attenuation with increasing depth. The results showed that localization error increased with depth, whereas the recognition success rate decreased accordingly. This trend was consistent with that observed in the turbidity experiments, indicating that illumination conditions directly affect visual localization performance.
Figure 19 presents the time history of localization error during a single docking process; at greater depths, the error exhibited stronger oscillations and required a longer time to converge. Statistics from 100 trials further revealed that increasing depth reduced the system’s tolerance to initial position deviations, and visual localization failed in some cases when the error exceeded the recognition range, demonstrating that depth-induced illumination attenuation degrades localization accuracy, as shown in
Table 7.
In the flow-velocity disturbance experiment, a test condition with a depth of 1 m, slight turbidity, and zero randomly placed small obstacles was used, and the lateral current velocity was increased stepwise (0, 0.5, and 1.0 m/s). The results showed that the visual localization error was only weakly affected by changes in flow velocity, indicating that image acquisition and feature extraction were largely insensitive to the current speed. In contrast, the recognition success rate decreased significantly as the flow velocity increased. As shown in
Figure 20, the lateral-velocity error in a single trial exhibited increasingly pronounced temporal fluctuations, suggesting that the current disturbance primarily affected the AUV dynamics by inducing heading oscillations and trajectory instability, thereby elevating the likelihood of visual recognition failure, as shown in
Table 8. These findings indicate that visual localization algorithms should be further optimized for dynamic flow environments to suppress instability induced by hydrodynamic disturbances.
In the obstacle-occlusion experiment, the water depth was set to 1 m, the turbidity condition was set to slightly turbid, and the flow velocity was set to 0 m/s. Within a 600 m
3 simulated water volume, the number of small-volume obstacles was progressively increased (600, 6000, and 60,000). As shown in
Figure 21, for the successfully recognized cases, both the relative and absolute localization errors varied only marginally; however, as the obstacle count increased, marker failures emerged within specific distance intervals. Compared with QR-code markers, point-light-source markers maintained a longer effective recognition distance under severe occlusion conditions. This difference was attributed to the small physical size of QR codes, which made them highly sensitive to local occlusion and prone to decoding failure even at short range under partial blockage conditions, whereas point-light-source markers, with larger inter-feature spacing, preserved identifiable global geometry despite partial central occlusion, as shown in
Table 9. Overall, increasing the number of small-volume obstacles substantially reduced recognition success and narrowed the usable ranging range, indicating that future vision algorithms should further enhance occlusion-aware prediction and completion capabilities to achieve robust recognition across different marker types.
4. Conclusions
An AUV simulation platform was developed for underwater vision-based docking localization. The entire localization workflow was systematically modeled and simulated by accounting for key factors such as visual marker diversity, underwater optical imaging characteristics, and current-induced disturbances and occlusions. The platform incorporated parameterized turbidity and illumination models, configurable current and occlusion scenarios, and flexible configurations of multiple marker types, enabling controllable reproduction and broad coverage of complex underwater visual conditions, as well as standardized integration and benchmarking of docking localization algorithms. The results showed that the platform can effectively differentiate algorithm performance under varying illumination and turbidity, current disturbance, and severe occlusion conditions, providing a reliable and efficient environment for comparative evaluation and iterative refinement. In addition, it helps reduce the risk and cost of real-world underwater docking localization experiments and supports engineering-oriented deployment of related methods.
Although a simulation-based verification environment for underwater visual docking was developed in this work, with targeted extensions to optical modeling, fiducial-marker design, scene dynamics, and camera configuration, several limitations remain and should be addressed in future research.
- 1.
The current platform was primarily designed for visual localization, and the hydrodynamic and control components were modeled in a simplified manner; complex fluid-dynamic effects and high-fidelity realizations of control laws were not fully represented, which limited its utility for controller design and performance assessment. More complete hydrodynamic and control models should therefore be incorporated to enable closed-loop co-simulation and joint optimization of visual perception and control strategies.
- 2.
The optical and camera models were largely based on parametric approximations and may not adequately capture the highly variable underwater light field and the distribution of suspended particles observed in real marine environments, leaving a potential simulation-to-reality gap in rendered imagery. This gap could be reduced by refining light-transport and scattering models using measured data, while also considering multimodal sensor configurations and marker designs to improve robustness under challenging conditions.
- 3.
The present environment included a relatively limited set of scene elements and target types, focusing mainly on dynamic occlusion and intermittent loss of markers and obstacles, and did not sufficiently cover diverse underwater structures, ecological factors, or long-term environmental variations. Future work should enrich both static and dynamic assets and incorporate environment-evolution mechanisms to build a higher-fidelity simulation testbed applicable to a broader range of underwater tasks.