Automatic Alignment Method of Underwater Charging Platform Based on Monocular Vision Recognition

To enhance the crypticity and operational efficiency of unmanned underwater vehicle (UUV) charging, we propose an automatic alignment method for an underwater charging platform based on monocular vision recognition. This method accurately identifies the UUV number and guides the charging stake to smoothly insert into the charging port of the UUV through target recognition. To decode the UUV’s identity information, even in challenging imaging conditions, an encryption encoding method containing redundant information and an ArUco code reconstruction method are proposed. To address the challenge of underwater target location determination, a target location determination method was proposed based on deep learning and the law of refraction. The method can determine the two-dimensional coordinates of the target location underwater using the UUV target spray position. To meet the real-time control requirements and the harsh underwater imaging environment, we proposed a target recognition algorithm to guide the charging platform towards the target direction. The practical underwater alignment experiments demonstrate the method’s strong real-time performance and its adaptability to underwater environments. The final alignment error is approximately 0.5548 mm, meeting the required alignment accuracy and ensuring successful alignment.


Introduction
Unmanned underwater vehicles (UUVs) play an irreplaceable role in various fields, serving as oceanic equipment suitable for underwater tasks. They are widely utilized for tasks including seafood fishing, subsea pipeline tracking, seafloor mapping, submarine cable laying, and marine resource exploration. However, as the scope of UUV missions continues to expand, the issue of their endurance has become a focal point. Due to the limited energy carried by UUVs, frequent charging becomes necessary. However, surface charging not only reduces the operational efficiency of UUVs and increases costs but also compromises their stealth capabilities during mission execution [1,2]. To address this challenge, underwater charging platforms have emerged, enabling UUVs to recharge without the need to surface. Currently, there is a wealth of research on UUV docking. Researchers have utilized navigation systems such as acoustics [3,4], optics [5], and electromagnetics [6,7] to guide UUVs into docking station (DS). However, to the best of our knowledge, there are few methods available for guiding the charging stake to accurately insert into the UUV's charging port after docking. Achieving automatic alignment of underwater charging platforms is a current trend in the development of underwater equipment technology and holds significant research and practical value.
Although there is limited research on automatic alignment of underwater charging platforms, the process of inserting the charging stake into the UUV's charging port can be conceptualized as a peg-in-hole assembly. The solution to this problem can be broadly categorized into contact-based and non-contact-based methods. Contact-based methods [8] typically involve the end of the shaft contacting the plane where the hole is located, followed by using a force sensor to search for the hole's position on the plane. This method has low safety and can potentially damage the outer surface of the UUVs. Non-contact-based methods can be divided into methods based on laser alignment instruments, acoustic sensors, and vision sensors. The core components of a laser alignment instrument are a semiconductor laser that emits laser beams and a photoelectric semiconductor position detector that collects information about the position of the laser spot [9]. Therefore, precise alignment can be achieved by installing laser alignment instruments on the hole axis. However, underwater small planktonic organisms cause light scattering, affecting the alignment accuracy, and suspended particles in the water obstruct the laser, rendering the alignment process unable to continue. Due to the slow attenuation of sound waves underwater, acoustic sensors are widely used in various underwater positioning and navigation tasks [4], but their alignment accuracy is lower in short-distance scenarios. Vision-based alignment, on the other hand, corrects alignment deviations through visual feedback, providing positioning and guidance for fragile or easily disturbed objects without physical contact [10]. It exhibits robustness in underwater environments and meets alignment requirements in terms of accuracy. Therefore, we adopt a vision-based non-contact alignment approach to guide the charging stake into the UUV's charging port.
Alignment operations using vision sensors have been widely studied in various fields. For instance, Fan et al. [11] proposed a laser-vision-sensor-based method for initial point alignment of narrow weld seams by utilizing the relationship between laser streak feature points and initial points. They obtain a high signal-to-noise image of the narrow weld seam using the laser vision sensor, and then calculate the 3D coordinates of the final image feature point and the initial point based on the alignment model. Finally, they control the actuator to achieve the initial point alignment. In another study, Chen et al. [12] developed an automatic alignment method for tracking the antenna of an unmanned aerial vehicle (UAV) using computer vision technology. The antenna angle is adjusted using the relative position between the center of the UAV image and the center of the camera image fixed on the antenna. The two image centers overlap during antenna alignment. Similarly, Jongwon et al. [13] designed a vision system that uses three cameras to locate the wafer's position for wafer alignment.
Underwater image processing techniques are of great importance in the field of underwater charging platform alignment due to the problems of low contrast, blurred edges with blue-green tones, and other problems. Traditional underwater image processing techniques can be divided into two categories: image enhancement and image restoration. Image enhancement algorithms [14] include histogram equalization, white balance, Retinex, wavelet transform, etc. These algorithms can render underwater objects clearer by enhancing image contrast and denoising. Image restoration techniques recover images by solving two unknown variables in the Jaffe-McGlamery [15,16] underwater imaging model: the transmission map and background light. For example, the dark channel prior algorithm (DCP) proposed by He et al. [17,18], which simplifies the JM model by introducing a priori knowledge that the dark channel value of a clear, fog-free image is close to zero. Variants of the DCP algorithm [19][20][21][22] have been developed and optimized over time, achieving better results. In recent years, convolutional neural networks have made remarkable achievements in multiple fields such as image classification [23], object detection [24], and instance segmentation [25]. Increasingly, many networks are being used to process underwater images. Some of these networks are end-to-end [26][27][28], which output the recovered image directly after inputting the original image, while others use deep learning to derive some of the physical parameters of the underwater imaging model and then perform image restoration [29]. This method has good performance and strong robustness but is not suitable for situations with limited hardware resources.
Accurate pose estimation is a primary prerequisite for successful alignment. Pose estimation technology recovers the position and orientation of an object by observing the correspondence between its image and features [30], which can be divided into three types: corners, lines, and ellipses (circles). Luckett et al. [31] compared the performance of these three features and found that the accuracy and precision of corner and line features increase as the distance decreases, but in high-noise environments, ellipse features have the strongest robustness. To address the issue of ellipse detection accuracy, Zhang et al. [32] improved the circle-based ellipse detection method and designed a sub-pixel edge-based ellipse detection method. This improved the accuracy of ellipse detection, especially in cases where the ellipse is incomplete. It was the first to prove that improving the accuracy of ellipse edges helps to improve the detection accuracy of ellipses. Huang et al. [33] proposed a universal circle and point fusion framework that can solve pose estimation problems with various feature combinations, combining the advantages of both features with high accuracy and robustness. Meng et al. [30] proposed a perspective circle and line (PCL) method that uses the perspective view of a single circle and line to recover the position and orientation of an object, which solves the duality and restores the roll angle.
We proposed an automatic alignment method for an underwater charging platform based on monocular vision recognition. After the UUV enters the underwater charging platform, the method accurately identifies its number and guides the charging platform to move towards the target direction using target recognition. This is achieved by calculating the deviation between the current position of the target keypoints and the target position, which aligns the charging stake with the UUV's charging port. The main contributions of this paper are as follows: 1. A single-camera visual-recognition-based UUV underwater alignment method is proposed that includes an encoding and decoding method for encrypted graphic targets, a method for determining the two-dimensional coordinates of the target location, and a target recognition algorithm, which can guide the charging stake on the charging platform to smoothly insert into the UUV's charging port.
2. The method can adapt to underwater environments and has certain robustness to partial occlusion. Additionally, this method requires less computational resources, lower hardware requirements, shorter processing times, and satisfies real-time control requirements. Moreover, the detection accuracy of this method meets the requirements for smooth alignment.
The rest of this paper is organized as follows. Section 2 describes the proposed singlecamera visual-recognition-based UUV underwater alignment method in detail. Section 3 presents the experimental results of the proposed method. In Section 4, we analyze the experimental results from Section 3 and describe the shortcomings of our method. Finally, Section 5 presents our conclusions.

Methods
The structure of the charging platform utilized in this method is illustrated in Figure 1, where both the camera and charging stake are attached to the axial sliding table, which is fixed on the circumferential turntable. Meanwhile, the UUV is secured onto the alignment platform. Due to the positioning of the alignment platform, the camera's distance from the UUV target at the target position is a known, fixed value. Consequently, the twodimensional coordinates of the target's keypoints on the UUV remain unchanged at the target position. By comparing the two-dimensional coordinates of the target keypoints at the current position and the target position, the direction of motion of the charging stake can be determined. The proposed method comprises three stages, as shown in Figure 2. In the first stage, the UUV's identity information is decoded to obtain its number, which serves as an index to retrieve the registered information of the UUV within the charging platform. This information includes the UUV's charging voltage, size, and target spray position. Subsequently, the UUV is firmly clamped onto the alignment platform by the clamping device of the underwater charging platform. In the second stage, the UUV's size and target spray position are obtained based on the retrieved information, and the target position for docking is determined. This target position is the two-dimensional coordinate on the camera imaging plane where the keypoints of the UUV's target are located when the charging stake on the underwater charging platform can be inserted into the UUV's charging port. In the third stage, the keypoints of the UUV's target are recognized, and the charging stake is guided to move towards the target position by calculating the deviation between the current position and the target position. The stake is first aligned circumferentially and then aligned axially until the distance between the current position and the target position is within the allowable error range, as shown in Figure 3.  The proposed method comprises three stages, as shown in Figure 2. In the first stage, the UUV's identity information is decoded to obtain its number, which serves as an index to retrieve the registered information of the UUV within the charging platform. This information includes the UUV's charging voltage, size, and target spray position. Subsequently, the UUV is firmly clamped onto the alignment platform by the clamping device of the underwater charging platform. In the second stage, the UUV's size and target spray position are obtained based on the retrieved information, and the target position for docking is determined. This target position is the two-dimensional coordinate on the camera imaging plane where the keypoints of the UUV's target are located when the charging stake on the underwater charging platform can be inserted into the UUV's charging port. In the third stage, the keypoints of the UUV's target are recognized, and the charging stake is guided to move towards the target position by calculating the deviation between the current position and the target position. The stake is first aligned circumferentially and then aligned axially until the distance between the current position and the target position is within the allowable error range, as shown in Figure 3. The proposed method comprises three stages, as shown in Figure 2. In the first stage, the UUV's identity information is decoded to obtain its number, which serves as an index to retrieve the registered information of the UUV within the charging platform. This information includes the UUV's charging voltage, size, and target spray position. Subsequently, the UUV is firmly clamped onto the alignment platform by the clamping device of the underwater charging platform. In the second stage, the UUV's size and target spray position are obtained based on the retrieved information, and the target position for docking is determined. This target position is the two-dimensional coordinate on the camera imaging plane where the keypoints of the UUV's target are located when the charging stake on the underwater charging platform can be inserted into the UUV's charging port. In the third stage, the keypoints of the UUV's target are recognized, and the charging stake is guided to move towards the target position by calculating the deviation between the current position and the target position. The stake is first aligned circumferentially and then aligned axially until the distance between the current position and the target position is within the allowable error range, as shown in Figure 3.   In the figure, the arrow represents the rotational direction of the stake during circumferential alignment. (c) Axial alignment. In the figure, the arrow represents the translational direction of the stake during axial alignment.

Encoding
Before UUVs can charge or exchange information with underwater charging platforms, their identity information should be determined to identify their model and charging voltage, and to confirm their mission type and ensure secure information exchange. Therefore, a UUV identity information encryption and coding method is necessary to ensure information security. Firstly, the UUV number is expanded to three digits, with leading zeros added if necessary. The UUV number is denoted as A A A (A , A , A ∈ [0,9], A , A , A ∈ ) . Next, four coding values are obtained: A A , A A , A A , A +A + A . These coding values are used to query their corresponding ArUco codes, which are graphic codes obtained by converting the numeric codes. Finally, the four ArUco codes obtained from the previous step are rotated clockwise by 0°, 90°, 180°, and 270°, respectively, and their position information is added to each coding value through the ArUco code's pose information. The resulting encoding pa ern is shown in Figure 4. This encoding method has some redundancy; when decoding, it is only necessary to recognize the position and ID of any two of the four coding values to infer the UUV number.

Decoding
Due to the challenging imaging conditions, the ArUco codes in the original images cannot be recognized directly. Therefore, this paper proposes a method for ArUco code detection, which involves image restoration, thresholding, and template-filling techniques to reconstruct the ArUco codes in the respective regions. The specific steps are illustrated in Figure 5.  Before UUVs can charge or exchange information with underwater charging platforms, their identity information should be determined to identify their model and charging voltage, and to confirm their mission type and ensure secure information exchange. Therefore, a UUV identity information encryption and coding method is necessary to ensure information security. Firstly, the UUV number is expanded to three digits, with leading zeros added if necessary. The UUV number is denoted as 9], A 1 , A 2 , A 3 ∈ Z). Next, four coding values are obtained: A 1 A 2 , A 2 A 3 , A 1 A 3 , A 1 + A 2 + A 3 . These coding values are used to query their corresponding ArUco codes, which are graphic codes obtained by converting the numeric codes. Finally, the four ArUco codes obtained from the previous step are rotated clockwise by 0 • , 90 • , 180 • , and 270 • , respectively, and their position information is added to each coding value through the ArUco code's pose information. The resulting encoding pattern is shown in Figure 4. This encoding method has some redundancy; when decoding, it is only necessary to recognize the position and ID of any two of the four coding values to infer the UUV number. In the figure, the arrow represents the rotational direction of the stake during circumferential alignment. (c) Axial alignment. In the figure, the arrow represents the translational direction of the stake during axial alignment.

Encoding
Before UUVs can charge or exchange information with underwater charging platforms, their identity information should be determined to identify their model and charging voltage, and to confirm their mission type and ensure secure information exchange. Therefore, a UUV identity information encryption and coding method is necessary to ensure information security. Firstly, the UUV number is expanded to three digits, with leading zeros added if necessary. The UUV number is denoted as A A A (A , A , A ∈ [0,9], A , A , A ∈ ) . Next, four coding values are obtained: A A , A A , A A , A +A + A . These coding values are used to query their corresponding ArUco codes, which are graphic codes obtained by converting the numeric codes. Finally, the four ArUco codes obtained from the previous step are rotated clockwise by 0°, 90°, 180°, and 270°, respectively, and their position information is added to each coding value through the ArUco code's pose information. The resulting encoding pa ern is shown in Figure 4. This encoding method has some redundancy; when decoding, it is only necessary to recognize the position and ID of any two of the four coding values to infer the UUV number.

Decoding
Due to the challenging imaging conditions, the ArUco codes in the original images cannot be recognized directly. Therefore, this paper proposes a method for ArUco code detection, which involves image restoration, thresholding, and template-filling techniques to reconstruct the ArUco codes in the respective regions. The specific steps are illustrated in Figure 5.

Decoding
Due to the challenging imaging conditions, the ArUco codes in the original images cannot be recognized directly. Therefore, this paper proposes a method for ArUco code detection, which involves image restoration, thresholding, and template-filling techniques to reconstruct the ArUco codes in the respective regions. The specific steps are illustrated in Figure 5. In the first step, the original image is subjected to image restoration. The core of the image restoration method involves solving for two unknowns, ( , ) and ( ), based on the underwater imaging model as represented by Equation (1). We adopt the method proposed in [19] and use the difference between the red channel and the maximum value of the blue and green channels to estimate the transmission of the red channel. The transmission map of the blue and green channels is then obtained based on statistical analysis [34], as shown in Equation (2). Furthermore, we apply the gray world assumption theory [9] in the field of image restoration to estimate the background light, as shown in Equation (3). By substituting the calculated values of ( , ) and ( ) into Equation (1), the restored image can be obtained.
where is a local patch in the image.
where is a constant value that represents the desired mean gray value of the restored image, which is set to 0.5 in this paper.
In the second step, the restored image is subjected to a thresholding operation. Although the contrast of the restored image has been improved, it still does not meet the recognition criteria for ArUco codes. Traditional methods utilize contrast enhancement and image binarization techniques to assist in ArUco code recognition. However, these operations can amplify the noise in the image and result in the failure of ArUco code recognition. Therefore, this paper proposes an improved approach to local thresholding, as shown in Equation (4). This method creates small window and large window around each pixel in the image. It compares the mode of the pixel grayscale values in with the average grayscale value of the pixels in and returns the grayscale value of the pixel at the center. Compared to traditional binarization methods, this approach has the advantage of using the mode of the grayscale values in a small window centered In the first step, the original image is subjected to image restoration. The core of the image restoration method involves solving for two unknowns, t(x, λ) and B ∞ (λ), based on the underwater imaging model as represented by Equation (1). We adopt the method proposed in [19] and use the difference between the red channel and the maximum value of the blue and green channels to estimate the transmission of the red channel. The transmission map of the blue and green channels is then obtained based on statistical analysis [34], as shown in Equation (2). Furthermore, we apply the gray world assumption theory [9] in the field of image restoration to estimate the background light, as shown in Equation (3). By substituting the calculated values of t(x, λ) and B ∞ (λ) into Equation (1), the restored image can be obtained. where where Ω is a local patch in the image.
where M is a constant value that represents the desired mean gray value of the restored image, which is set to 0.5 in this paper.
In the second step, the restored image is subjected to a thresholding operation. Although the contrast of the restored image has been improved, it still does not meet the recognition criteria for ArUco codes. Traditional methods utilize contrast enhancement and image binarization techniques to assist in ArUco code recognition. However, these operations can amplify the noise in the image and result in the failure of ArUco code recognition. Therefore, this paper proposes an improved approach to local thresholding, as shown in Equation (4). This method creates small window w 1 and large window w 2 around each pixel in the image. It compares the mode of the pixel grayscale values in w 1 with the average grayscale value of the pixels in w 2 and returns the grayscale value of the pixel at the center. Compared to traditional binarization methods, this approach has the advantage of using the mode of the grayscale values in a small window centered around each pixel for comparison, which helps with noise reduction. Additionally, it categorizes all pixels into five classes instead of two (0 and 255), resulting in smoother transitions between grayscale values and enhancing the robustness of the ArUco code recovery process in the third step.
where w 1 and w 2 are sliding windows with radii of d w1 and d w2 , as shown in Figure 6. In this paper, d w2 = 10d w1 . around each pixel for comparison, which helps with noise reduction. Additionally, it categorizes all pixels into five classes instead of two (0 and 255), resulting in smoother transitions between grayscale values and enhancing the robustness of the ArUco code recovery process in the third step.
where and are sliding windows with radii of and , as shown in Figure  6. In this paper, = 10 . In the last step, the ArUco codes are reconstructed. After obtaining the thresholded image, the regions of interest (ROIs) containing the ArUco codes can be determined. Each ROI is then divided into 36 equally sized rectangles, with the central 16 rectangles containing the encoding information of the ArUco codes. The color of the corresponding blank positions in the template is determined based on the average grayscale value within each rectangle. By filling in the template, the reconstructed ArUco codes are obtained. Finally, the ID and angle of the reconstructed ArUco codes are identified.

Determination of Target Position
Due to the complex underwater environment, fixing all UUVs to a position that allows the charging stake to be smoothly inserted into their charging port and recording the two-dimensional coordinates of the current target keypoints would require a lot of manpower and resources. Therefore, this paper proposes a method to determine the two-dimensional coordinates of the target keypoints underwater based on the target spraying positions on the UUV. This method first determines the above-water coordinates of the target position based on the target spraying position and then uses the law of refraction to determine the underwater coordinates. In the last step, the ArUco codes are reconstructed. After obtaining the thresholded image, the regions of interest (ROIs) containing the ArUco codes can be determined. Each ROI is then divided into 36 equally sized rectangles, with the central 16 rectangles containing the encoding information of the ArUco codes. The color of the corresponding blank positions in the template is determined based on the average grayscale value within each rectangle. By filling in the template, the reconstructed ArUco codes are obtained. Finally, the ID and angle of the reconstructed ArUco codes are identified.

Determination of Target Position
Due to the complex underwater environment, fixing all UUVs to a position that allows the charging stake to be smoothly inserted into their charging port and recording the twodimensional coordinates of the current target keypoints would require a lot of manpower and resources. Therefore, this paper proposes a method to determine the two-dimensional coordinates of the target keypoints underwater based on the target spraying positions on the UUV. This method first determines the above-water coordinates of the target position based on the target spraying position and then uses the law of refraction to determine the underwater coordinates.

Above-Water Coordinates of the Target Position
The schematic diagram of the two-dimensional coordinates of the water surface target position is shown in Figure 7. The two-dimensional coordinates of the above-water target position refer to the coordinates of the keypoint of the target T in the o-uv coordinate system. During the UUV target spraying, the relative position between the keypoint of the target and the UUV charging port can be obtained, that is, the coordinates (X, Y, Z) of the keypoints of the target in the coordinate system O-XYZ. Since the target position of the underwater charging platform is the position where the charging stake can be inserted into the charging port of the UUV, the coordinates (X W , Y W , Z W ) of the keypoint of the target in the coordinate system O w − X W Y W Z W are (X, Y, Z + L). The value of L is determined by the type of UUV and can be obtained during the decoding process since the clamping device of the charging platform in this method will fix and move the UUV to a specific position. The process of converting the coordinates in the coordinate system O w − X W Y W Z W to the coordinates in the coordinate system o-uv can be regarded as the camera calibration process. The conversion process is shown in Equation (1): where R T 0 1 represents the camera's extrinsic matrix, which represents the position relationship between the world coordinate system O w − X W Y W Z W and the camera co- which represents the transformation relationship between the camera coordinate system O c − X c Y c Z c and the image coordinate system o-uv. The parameters α x and α y represent the scaling factors in the x and y directions, respectively. The parameter f represents the camera's focal length, and (u 0 , v 0 ) represents the principal point of the camera, which is the coordinate of the camera's optical center in the image.

Above-water Coordinates of the Target Position
The schematic diagram of the two-dimensional coordinates of the water surface target position is shown in Figure 7. The two-dimensional coordinates of the above-water target position refer to the coordinates of the keypoint of the target T in the o-uv coordinate system. During the UUV target spraying, the relative position between the keypoint of the target and the UUV charging port can be obtained, that is, the coordinates (X, Y, Z) of the keypoints of the target in the coordinate system O-XYZ. Since the target position of the underwater charging platform is the position where the charging stake can be inserted into the charging port of the UUV, the coordinates ( , , ) of the keypoint of the target in the coordinate system O − X Y Z are (X, Y, Z + L). The value of L is determined by the type of UUV and can be obtained during the decoding process since the clamping device of the charging platform in this method will fix and move the UUV to a specific position. The process of converting the coordinates in the coordinate system O − X Y Z to the coordinates in the coordinate system o-uv can be regarded as the camera calibration process. The conversion process is shown in Equation (1): where 0 1 represents the camera's extrinsic matrix, which represents the position relationship between the world coordinate system O − X Y Z and the camera coordi-  The schematic diagram of the two-dimensional coordinates of the above-water target position is shown above. In the diagram, T represents the spray position of the target on the UUV, S represents the position of the UUV charging port, and P represents the charging pile of the underwater charging platform. The O-XYZ coordinate system has the center of the UUV charging port as the origin, the O − X Y Z coordinate system has the center of the charging pile on the underwater charging platform as the origin, the O − Y coordinate system has the camera center as the origin, and the o-uv two-dimensional coordinate system has the upper left corner of the image as the origin. L represents the distance between the UUV and the charging platform, which is determined by the UUV model and can be obtained during the decoding process. In this method, the UUV is fixed and moved to a specific position by the clamping device of the charging platform. Figure 7. The schematic diagram of the two-dimensional coordinates of the above-water target position is shown above. In the diagram, T represents the spray position of the target on the UUV, S represents the position of the UUV charging port, and P represents the charging pile of the underwater charging platform. The O-XYZ coordinate system has the center of the UUV charging port as the origin, the O w − X W Y W Z W coordinate system has the center of the charging pile on the underwater charging platform as the origin, the O c − X c Y c Z c coordinate system has the camera center as the origin, and the o-uv two-dimensional coordinate system has the upper left corner of the image as the origin. L represents the distance between the UUV and the charging platform, which is determined by the UUV model and can be obtained during the decoding process. In this method, the UUV is fixed and moved to a specific position by the clamping device of the charging platform.
The camera's intrinsic parameters can be obtained and image distortion can be corrected using Zhang's calibration method [35]. However, due to the installation errors of the camera, it is necessary to accurately determine the camera's extrinsic matrix through further hand-eye calibration [36].
Inspired by neural network concepts, this paper transforms the camera calibration problem into estimating the function f in the Equation (6). To achieve this, the paper controls the movement of the slider and captures 25 photos, as shown in Figure 8. The coordinates of the four concentric circle centers in the world coordinate system X i W , Y i W , Z i W and their corresponding coordinates in the image coordinate system u i , v i are recorded for each photo. This resulted in 100 samples, where X i W , Y i W , Z i W , 1 serve as data and u i , v i as labels. Fifty percent of the samples were used for training, and the remaining fifty percent for testing. A single-hidden-layer neural network without activation functions was constructed, as shown in Figure 9. The mean squared error (MSE) loss function was employed, and the external parameter matrix estimate was used to initialize the first layer of the network, while the calibrated internal parameters were used to initialize the second layer. The initialized network showed fast convergence, small oscillations, and ultimately converged to a smaller loss value. The camera's intrinsic parameters can be obtained and image distortion can be corrected using Zhang's calibration method [35]. However, due to the installation errors of the camera, it is necessary to accurately determine the camera's extrinsic matrix through further hand-eye calibration [36].
Inspired by neural network concepts, this paper transforms the camera calibration problem into estimating the function f in the Equation (6). To achieve this, the paper controls the movement of the slider and captures 25 photos, as shown in Figure 8. The coordinates of the four concentric circle centers in the world coordinate system , , and their corresponding coordinates in the image coordinate system ( , ) are recorded for each photo. This resulted in 100 samples, where , , , 1 serve as data and ( , ) as labels. Fifty percent of the samples were used for training, and the remaining fifty percent for testing. A single-hidden-layer neural network without activation functions was constructed, as shown in Figure 9. The mean squared error (MSE) loss function was employed, and the external parameter matrix estimate was used to initialize the first layer of the network, while the calibrated internal parameters were used to initialize the second layer. The initialized network showed fast convergence, small oscillations, and ultimately converged to a smaller loss value.  After training, the network weights consist of two matrices, and , with dimensions of 4 × 3 and 3 × 3, respectively. Given a coordinate ( , , ) in the world coordinate system, its corresponding coordinate in the image coordinate system can be calculated using Equation (7). It is worth noting that and obtained from training do not The camera's intrinsic parameters can be obtained and image distortion can be corrected using Zhang's calibration method [35]. However, due to the installation errors of the camera, it is necessary to accurately determine the camera's extrinsic matrix through further hand-eye calibration [36].
Inspired by neural network concepts, this paper transforms the camera calibration problem into estimating the function f in the Equation (6). To achieve this, the paper controls the movement of the slider and captures 25 photos, as shown in Figure 8. The coordinates of the four concentric circle centers in the world coordinate system , , and their corresponding coordinates in the image coordinate system ( , ) are recorded for each photo. This resulted in 100 samples, where , , , 1 serve as data and ( , ) as labels. Fifty percent of the samples were used for training, and the remaining fifty percent for testing. A single-hidden-layer neural network without activation functions was constructed, as shown in Figure 9. The mean squared error (MSE) loss function was employed, and the external parameter matrix estimate was used to initialize the first layer of the network, while the calibrated internal parameters were used to initialize the second layer. The initialized network showed fast convergence, small oscillations, and ultimately converged to a smaller loss value.  After training, the network weights consist of two matrices, and , with dimensions of 4 × 3 and 3 × 3, respectively. Given a coordinate ( , , ) in the world coordinate system, its corresponding coordinate in the image coordinate system can be calculated using Equation (7). It is worth noting that and obtained from training do not After training, the network weights consist of two matrices,Ê andÎ, with dimensions of 4 × 3 and 3 × 3, respectively. Given a coordinate (X W , Y W , Z W ) in the world coordinate system, its corresponding coordinate in the image coordinate system can be calculated using Equation (7). It is worth noting thatÊ andÎ obtained from training do not have physical meaning, and the intermediate variablesX c ,Ŷ c , andẐ c in Equation (7) are not the coordinates of the point in the camera coordinate system.

Underwater Coordinates of the Target Location
The imaging principle of the underwater camera is shown in Figure 10. In air, the light reflected by the target propagates in a straight line, and the size of an object with a size of h projected onto the camera imaging plane is a. However, underwater, due to refraction of light between different media, the size of an object with a size of h projected onto the imaging plane is b, and according to Snell's Law [37], as shown in Equation (8), since the refractive index of water is 1.333, α > θ and b> a.
where θ represents the angle of incidence, α represents the angle of refraction, and n represents the refractive index of the medium.  (7) are not the coordinates of the point in the camera coordinate system.

Underwater Coordinates of the Target Location
The imaging principle of the underwater camera is shown in Figure 10. In air, the light reflected by the target propagates in a straight line, and the size of an object with a size of h projected onto the camera imaging plane is a. However, underwater, due to refraction of light between different media, the size of an object with a size of h projected onto the imaging plane is b, and according to Snell's Law [37], as shown in Equation (8), since the refractive index of water is 1.333, > and b> . Figure 10. The imaging principle of the underwater camera. In the diagram, o represents the camera's optical center, f represents the camera's focal length, K represents the distance from the refraction surface (the outer glass of the camera) to the camera's optical center, and D represents the distance from the target to the refraction surface. Similar to L mentioned earlier, D is a known constant value determined by the positioning of the clamp and the structure of the underwater charging platform.
= sin sin (8) where represents the angle of incidence, represents the angle of refraction, and n represents the refractive index of the medium.
It can be inferred that the projection of underwater objects on the imaging plane can be obtained by magnifying the projection of objects above water with the camera center as the projection center by a certain factor. The magnification factor can be obtained from Equation (9).

Target Recognition and Instruction Provision
This method utilizes a target designed by Tweddle et al. [38], as shown in Figure 11. The target consists of four concentric circles with area ratios of 1.44, 1.78, 2.25, and 2.94, and the keypoints of the target are the centers of the four concentric circles. During the Figure 10. The imaging principle of the underwater camera. In the diagram, o represents the camera's optical center, f represents the camera's focal length, K represents the distance from the refraction surface (the outer glass of the camera) to the camera's optical center, and D represents the distance from the target to the refraction surface. Similar to L mentioned earlier, D is a known constant value determined by the positioning of the clamp and the structure of the underwater charging platform.
It can be inferred that the projection of underwater objects on the imaging plane can be obtained by magnifying the projection of objects above water with the camera center as the projection center by a certain factor. The magnification factor b a can be obtained from Equation (9). b a = (D + K)

Target Recognition and Instruction Provision
This method utilizes a target designed by Tweddle et al. [38], as shown in Figure 11. The target consists of four concentric circles with area ratios of 1.44, 1.78, 2.25, and 2.94, and the keypoints of the target are the centers of the four concentric circles. During the alignment process, the camera image is preprocessed first, followed by contour detection to identify circles that may originate from the target according to their area and roundness. Then, based on the area ratios, the concentric circles are matched, and the coordinate values of each concentric circle are obtained. Finally, based on the relative position between the visible concentric circle center of the current position and the corresponding target concentric circle center, the charging station is guided to move until the coordinate difference between the current and target positions is less than the maximum allowable error, and the movement of the charging station is controlled to stop. During the alignment process, the x-coordinate of the concentric circle center is compared first to guide the charging station to rotate tangentially for azimuthal alignment, followed by the y-coordinate of the concentric circle center to guide the charging station to move axially for axial alignment. ence between the current and target positions is less than the maximum allowable and the movement of the charging station is controlled to stop. During the alignme cess, the x-coordinate of the concentric circle center is compared first to guide the ch station to rotate tangentially for azimuthal alignment, followed by the y-coordinate concentric circle center to guide the charging station to move axially for axial align To address the problem of low contrast in underwater images, this paper em the Niblack binary thresholding method [39]. It calculates the pixel threshold by a rectangular window over the grayscale image [40], as shown in Equation (10). If th value surpasses the threshold, it is set as foreground; otherwise, it is set as backg Because the Niblack binary thresholding algorithm can adaptively adjust the thr based on local blocks, it can preserve more texture details in the image. However, its need for more computational resources and time to process images, the Niblack thresholding method is computationally expensive.
where T represents the pixel threshold, represents the sliding window, n rep the number of pixels in the window, represents the grayscale value of each p the window, m represents the pixel mean value of all points in the window, and k sents the correction parameter.
To accelerate the processing speed of Niblack thresholding, this paper is insp the ResNet network [41] and adopts a structure similar to the bo leneck architectu first reduces and then restores the size of the image. As the processing time requi Niblack thresholding is directly proportional to the size of the image, the size of the is reduced to one quarter of its original size, and then Niblack thresholding is perf Finally, the thresholded image is enlarged four times to restore its original size. Al the size of this image is the same as that of the original image, its information con only one quarter of that of the original image, and its contour details are relatively b Directly using this image for contour detection would lower the detection acc To address the problem of low contrast in underwater images, this paper employs the Niblack binary thresholding method [39]. It calculates the pixel threshold by sliding a rectangular window over the grayscale image [40], as shown in Equation (10). If the pixel value surpasses the threshold, it is set as foreground; otherwise, it is set as background. Because the Niblack binary thresholding algorithm can adaptively adjust the threshold based on local blocks, it can preserve more texture details in the image. However, due to its need for more computational resources and time to process images, the Niblack binary thresholding method is computationally expensive.
where T represents the pixel threshold, Ω represents the sliding window, n represents the number of pixels in the window, p i represents the grayscale value of each point in the window, m represents the pixel mean value of all points in the window, and k represents the correction parameter.
To accelerate the processing speed of Niblack thresholding, this paper is inspired by the ResNet network [41] and adopts a structure similar to the bottleneck architecture that first reduces and then restores the size of the image. As the processing time required for Niblack thresholding is directly proportional to the size of the image, the size of the image is reduced to one quarter of its original size, and then Niblack thresholding is performed. Finally, the thresholded image is enlarged four times to restore its original size. Although the size of this image is the same as that of the original image, its information content is only one quarter of that of the original image, and its contour details are relatively blurred. Directly using this image for contour detection would lower the detection accuracy. Therefore, inspired by the coarse-to-fine idea in the LoFTR algorithm [42], this paper uses the image to determine the region of interest (ROI) and performs Niblack thresholding on the ROI of the original image. This significantly reduces the number of pixels processed by Niblack thresholding, ensuring the real-time performance of the algorithm.
To address the problem of partial occlusion caused by bubbles and suspended particles in underwater images, this paper enhances the robustness of the algorithm by utilizing the redundancy of target information. During the alignment process, only one of the four concentric circles in the target needs to be identified, and then the two-dimensional coordinates of its center are compared with the center of the concentric circle corresponding to the target position area ratio, thus providing the motion instructions in both the circumferential and axial directions. If multiple concentric circle centers are detected, the center point of multiple circle centers is used for comparison.

Result of the Decoding Experiment
In this paper, decoding experiments were conducted underwater. During the encoding process, assuming the UUV is numbered 123, its corresponding four-digit code would be 12, 23, 31, and 6. These four digits were then converted into ArUco codes and rotated at 0 • , 90 • , 180 • , and 270 • , respectively, resulting in the final encoded patterns. The ArUco code recognition result is shown in Figure 12.
four concentric circles in the target needs to be identified, and then the tw coordinates of its center are compared with the center of the concentric circ ing to the target position area ratio, thus providing the motion instructio circumferential and axial directions. If multiple concentric circle centers ar center point of multiple circle centers is used for comparison.

Result of the Decoding Experiment
In this paper, decoding experiments were conducted underwater. Dur ing process, assuming the UUV is numbered 123, its corresponding four-dig be 12, 23, 31, and 6. These four digits were then converted into ArUco cod at 0°, 90°, 180°, and 270°, respectively, resulting in the final encoded pa er code recognition result is shown in Figure 12. The proposed image restoration method is compared with image enha rithms based on the gray world assumption theory [43], the UDCP algorithm Shallow-UWnet method [28] based on end-to-end convolutional neural net sults are shown in Figure 13. Furthermore, the proposed thresholding m pared with Niblack and Bernsen methods [45]. Table 1 presents the detec ArUco codes after being processed by different thresholding methods.  The proposed image restoration method is compared with image enhancement algorithms based on the gray world assumption theory [43], the UDCP algorithm [44], and the Shallow-UWnet method [28] based on end-to-end convolutional neural networks. The results are shown in Figure 13. Furthermore, the proposed thresholding method is compared with Niblack and Bernsen methods [45]. Table 1 presents the detection results of ArUco codes after being processed by different thresholding methods. Therefore, inspired by the coarse-to-fine idea in the LoFTR algorithm [42], this paper uses the image to determine the region of interest (ROI) and performs Niblack thresholding on the ROI of the original image. This significantly reduces the number of pixels processed by Niblack thresholding, ensuring the real-time performance of the algorithm.
To address the problem of partial occlusion caused by bubbles and suspended particles in underwater images, this paper enhances the robustness of the algorithm by utilizing the redundancy of target information. During the alignment process, only one of the four concentric circles in the target needs to be identified, and then the two-dimensional coordinates of its center are compared with the center of the concentric circle corresponding to the target position area ratio, thus providing the motion instructions in both the circumferential and axial directions. If multiple concentric circle centers are detected, the center point of multiple circle centers is used for comparison.

Result of the Decoding Experiment
In this paper, decoding experiments were conducted underwater. During the encoding process, assuming the UUV is numbered 123, its corresponding four-digit code would be 12, 23, 31, and 6. These four digits were then converted into ArUco codes and rotated at 0°, 90°, 180°, and 270°, respectively, resulting in the final encoded pa erns. The ArUco code recognition result is shown in Figure 12. The proposed image restoration method is compared with image enhancement algorithms based on the gray world assumption theory [43], the UDCP algorithm [44], and the Shallow-UWnet method [28] based on end-to-end convolutional neural networks. The results are shown in Figure 13. Furthermore, the proposed thresholding method is compared with Niblack and Bernsen methods [45]. Table 1 presents the detection results of ArUco codes after being processed by different thresholding methods.     Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes.   Figure 14 shows the loss variation of the proposed network architecture and the network architecture proposed by Cao et al. [46] during the training and testing processes. The training configuration is as follows: there are a total of 100 samples, with labels of ( , ) and data of , , , 1 ; half of them are training sets and the other half are test sets. The initial learning rate is set to 0.1, which decreases by half every 1000 epochs to optimize the training process. The MSE loss function is employed as the loss function The training configuration is as follows: there are a total of 100 samples, with labels of u i , v i and data of X i W , Y i W , Z i W , 1 ; half of them are training sets and the other half are test sets. The initial learning rate is set to 0.1, which decreases by half every 1000 epochs to optimize the training process. The MSE loss function is employed as the loss function during the training phase. Training ends when the test loss does not decrease for 500 consecutive epochs.

Result of the Experiment on Underwater Target Position Calibration
Based on Figure 15, the two-dimensional coordinates of the underwater target position can be obtained by enlarging the two-dimensional coordinates of the on-water target position with the projection of the camera center on the imaging plane as the center. The enlargement factor can be obtained from Equation (9). To determine the distance between the camera center and the refraction surface, as well as the projection of the camera center on the imaging plane, the projection of the same target on the camera imaging plane was recorded both underwater and above water. The targets were located 200 mm from the refraction surface, as shown in Figure 11. The corresponding keypoints A 2 A 1 , B 2 B 1 , C 2 C 1 , D 2 D 1 were connected and extended, and their intersection point P was taken as the projection of the camera's optical center on the imaging plane. , which were 1.308, were used as b a in Equation (9). By substituting D = 200 mm into Equation (9), the distance between the camera center and the refraction surface, K was obtained as 10.618 mm. during the training phase. Training ends when the test loss does not decrease for 500 consecutive epochs.

Result of the Experiment on Underwater Target Position Calibration
Based on Figure 15, the two-dimensional coordinates of the underwater target position can be obtained by enlarging the two-dimensional coordinates of the on-water target position with the projection of the camera center on the imaging plane as the center. The enlargement factor can be obtained from Equation (9). To determine the distance between the camera center and the refraction surface, as well as the projection of the camera center on the imaging plane, the projection of the same target on the camera imaging plane was recorded both underwater and above water. The targets were located 200 mm from the refraction surface, as shown in Figure 11. The corresponding keypoints A A , B B , C C , D D were connected and extended, and their intersection point P was taken as the projection of the camera's optical center on the imaging plane. The average values of , , , , which were 1.308, were used as in Equation (9). By substituting D = 200 mm into Equation (9), the distance between the camera center and the refraction surface, K was obtained as 10.618 mm.
(a) (b) Figure 15. Projection of the same target on the imaging plane underwater and above water. (a) When D = 200 mm, record the same target's projection on the imaging plane underwater and above water and mark its keypoints. In this figure, A , B , C , D represents keypoints of the above-water image, A , B , C , D represents keypoints of the underwater image. (b) When D = 260 mm, the location of the underwater keypoints is predicted from the calculated value of K and compared with the actual value. In this figure, A , B , C , D represents predicted keypoints of the underwater image.
Once the value of K is calculated, the camera's field of view (FOV) can be determined. The camera model used in this study is LI-IMX185MIPI-CS, with a sensor type of 1/1.9", which corresponds to a diagonal length of 1/1.9 inches. The aspect ratio of the image is 16:9. Therefore, the sensor size is 7.3396 mm × 4.1285 mm.
In air, using Equation (11), the camera's field of view is calculated to be 173.9073 mm × 309.1704 mm.
where and represent the width and height of the FOV, and and ℎ are the width and height of the sensor, respectively. denotes the camera's focal length. The definitions of and are illustrated in Figure 10. Once the value of K is calculated, the camera's field of view (FOV) can be determined. The camera model used in this study is LI-IMX185MIPI-CS, with a sensor type of 1/1.9", which corresponds to a diagonal length of 1/1.9 inches. The aspect ratio of the image is 16:9. Therefore, the sensor size is 7.3396 mm × 4.1285 mm.
In air, using Equation (11), the camera's field of view is calculated to be 173.9073 mm × 309.1704 mm.  (11) where W a and H a represent the width and height of the FOV, and w and h are the width and height of the sensor, respectively. f denotes the camera's focal length. The definitions of K and D are illustrated in Figure 10.
Underwater, due to the refraction of light between different media, the camera's FOV will be reduced. Using Equation (12), the camera's FOV is calculated to be 132.6537 mm × 235.8289 mm. tan where α h and θ h represent the refraction angle and incident angle in the height direction of the sensor, and α w and θ w represent the refraction angle and incident angle in the width direction of the sensor. These parameters are illustrated in Figure 10.
The accuracy of determining the underwater target position was verified below. The projection of the same target on the camera imaging plane was again recorded both underwater and on-water, with the target located 260 mm from the refraction surface. With the value of K calculated above, b a was determined as 1.313. Using P and b/a, the predicted two-dimensional coordinates of the underwater target position, A 2 , B 2 , C 2 , and D 2 , were obtained, as shown in Figure 11. The error between the predicted and true values is presented in Table 2.

Result of Image Processing
We downsized the images by different scales, applied Niblack binarization, and then resized them back to their original size. The average processing time for each frame is shown in Table 3. The fluctuations of the keypoint detection results for the target at different scaling ratios over 90 consecutive frames are illustrated in Figure 16. Underwater, due to the refraction of light between different media, the camera's FOV will be reduced. Using Equation (12) where and represent the refraction angle and incident angle in the height direction of the sensor, and and represent the refraction angle and incident angle in the width direction of the sensor. These parameters are illustrated in Figure 10.
The accuracy of determining the underwater target position was verified below. The projection of the same target on the camera imaging plane was again recorded both underwater and on-water, with the target located 260 mm from the refraction surface. With the value of K calculated above, was determined as 1.313. Using P and b/a, the predicted two-dimensional coordinates of the underwater target position, A , B , C , and D , were obtained, as shown in Figure 11. The error between the predicted and true values is presented in Table 2.

Result of Image Processing
We downsized the images by different scales, applied Niblack binarization, and then resized them back to their original size. The average processing time for each frame is shown in Table 3. The fluctuations of the keypoint detection results for the target at different scaling ratios over 90 consecutive frames are illustrated in Figure 16.

Result of the Actual Alignment Experiment
During the experiment, the first stage is the circumferential alignment process, where the circumferential hydraulic cylinder moves until the difference between the x coordinates of the current position and the target position is within the allowable error range. The second stage is the axial alignment process, where the axial hydraulic cylinder moves until the difference between the y coordinates of the current position and the target position is within the allowable error range. The third stage is the charging process, where the hydraulic cylinder controls the charging stack to rise and insert into the UUV charging port. The distance between the camera and the alignment platform was 200 mm. The displacement changes of each hydraulic cylinder during the alignment process are shown in Figures 17-19, and the distance changes between the current position of the target and the target position are shown in Figures 20 and 21.

Result of the Actual Alignment Experiment
During the experiment, the first stage is the circumferential alignment process, where the circumferential hydraulic cylinder moves until the difference between the x coordinates of the current position and the target position is within the allowable error range. The second stage is the axial alignment process, where the axial hydraulic cylinder moves until the difference between the y coordinates of the current position and the target position is within the allowable error range. The third stage is the charging process, where the hydraulic cylinder controls the charging stack to rise and insert into the UUV charging port. The distance between the camera and the alignment platform was 200 mm. The displacement changes of each hydraulic cylinder during the alignment process are shown in Figures 17-19, and the distance changes between the current position of the target and the target position are shown in Figures 20 and 21.

Result of the Actual Alignment Experiment
During the experiment, the first stage is the circumferential alignment process, where the circumferential hydraulic cylinder moves until the difference between the x coordinates of the current position and the target position is within the allowable error range. The second stage is the axial alignment process, where the axial hydraulic cylinder moves until the difference between the y coordinates of the current position and the target position is within the allowable error range. The third stage is the charging process, where the hydraulic cylinder controls the charging stack to rise and insert into the UUV charging port. The distance between the camera and the alignment platform was 200 mm. The displacement changes of each hydraulic cylinder during the alignment process are shown in Figures 17-19, and the distance changes between the current position of the target and the target position are shown in Figures 20 and 21.

Result of the Actual Alignment Experiment
During the experiment, the first stage is the circumferential alignment process, where the circumferential hydraulic cylinder moves until the difference between the x coordinates of the current position and the target position is within the allowable error range. The second stage is the axial alignment process, where the axial hydraulic cylinder moves until the difference between the y coordinates of the current position and the target position is within the allowable error range. The third stage is the charging process, where the hydraulic cylinder controls the charging stack to rise and insert into the UUV charging port. The distance between the camera and the alignment platform was 200 mm. The displacement changes of each hydraulic cylinder during the alignment process are shown in Figures 17-19, and the distance changes between the current position of the target and the target position are shown in Figures 20 and 21.

Discussion
The experimental results of underwater encoding are presented in Figure 12. Although some ArUco codes were not recognized due to occlusion, two ArUco codes were identified with IDs 31 and 6 and their yaw angles were 179° and −95°, respectively. Based on machine pose estimation, it can be inferred that ID 31 represents A A ; therefore, A = 1 and A = 3. ID 6 represents A + A + A , thus A = 2. Decoding yields the UUV number 123.
As shown in Figure 13, the original image suffers from low contrast, blurry contours, and a color shift towards blue and green due to light absorption and sca ering in the underwater environment. The method based on the gray world assumption theory [43] corrects the color shift but fails to enhance the image contrast. The UDCP method [44] intensifies the color shift towards green. The Shallow-UWnet method [28] enhances the image contrast but introduces an additional color shift towards yellow. In contrast, the proposed method in this paper corrects the color shift while enhancing image details. As shown in Table 1, by applying the proposed image restoration and thresholding methods, eight out of nine ArUco codes can be detected. When only the proposed image restoration method is applied, a maximum of four codes can be detected, while applying only the proposed image thresholding method can detect up to six codes. The best result of the remaining methods is achieved by combining the Niblack and the gray world assumption theory methods, which detects five codes. Therefore, both the proposed image restoration method and thresholding method effectively improve the detection rate of ArUco codes.
As shown in Figure 14, Cao's method [46] exhibits a fast convergence rate during the initial stages of training, but it also shows significant oscillations during the convergence process. In contrast, the proposed method in this paper has a slower convergence rate but demonstrates a stable convergence process, with the final loss value consistently reaching a smaller value. The minimum testing error achieved by Cao's method [46] is 0.3828 pixels,

Discussion
The experimental results of underwater encoding are presented in Figure 12. Although some ArUco codes were not recognized due to occlusion, two ArUco codes were identified with IDs 31 and 6 and their yaw angles were 179° and −95°, respectively. Based on machine pose estimation, it can be inferred that ID 31 represents A A ; therefore, A = 1 and A = 3. ID 6 represents A + A + A , thus A = 2. Decoding yields the UUV number 123.
As shown in Figure 13, the original image suffers from low contrast, blurry contours, and a color shift towards blue and green due to light absorption and sca ering in the underwater environment. The method based on the gray world assumption theory [43] corrects the color shift but fails to enhance the image contrast. The UDCP method [44] intensifies the color shift towards green. The Shallow-UWnet method [28] enhances the image contrast but introduces an additional color shift towards yellow. In contrast, the proposed method in this paper corrects the color shift while enhancing image details. As shown in Table 1, by applying the proposed image restoration and thresholding methods, eight out of nine ArUco codes can be detected. When only the proposed image restoration method is applied, a maximum of four codes can be detected, while applying only the proposed image thresholding method can detect up to six codes. The best result of the remaining methods is achieved by combining the Niblack and the gray world assumption theory methods, which detects five codes. Therefore, both the proposed image restoration method and thresholding method effectively improve the detection rate of ArUco codes.
As shown in Figure 14, Cao's method [46] exhibits a fast convergence rate during the initial stages of training, but it also shows significant oscillations during the convergence process. In contrast, the proposed method in this paper has a slower convergence rate but demonstrates a stable convergence process, with the final loss value consistently reaching a smaller value. The minimum testing error achieved by Cao's method [46] is 0.3828 pixels,

Discussion
The experimental results of underwater encoding are presented in Figure 12. Although some ArUco codes were not recognized due to occlusion, two ArUco codes were identified with IDs 31 and 6 and their yaw angles were 179 • and −95 • , respectively. Based on machine pose estimation, it can be inferred that ID 31 represents A 3 A 1 ; therefore, A 1 = 1 and A 3 = 3. ID 6 represents A 1 + A 2 + A 3 , thus A 2 = 2. Decoding yields the UUV number 123.
As shown in Figure 13, the original image suffers from low contrast, blurry contours, and a color shift towards blue and green due to light absorption and scattering in the underwater environment. The method based on the gray world assumption theory [43] corrects the color shift but fails to enhance the image contrast. The UDCP method [44] intensifies the color shift towards green. The Shallow-UWnet method [28] enhances the image contrast but introduces an additional color shift towards yellow. In contrast, the proposed method in this paper corrects the color shift while enhancing image details. As shown in Table 1, by applying the proposed image restoration and thresholding methods, eight out of nine ArUco codes can be detected. When only the proposed image restoration method is applied, a maximum of four codes can be detected, while applying only the proposed image thresholding method can detect up to six codes. The best result of the remaining methods is achieved by combining the Niblack and the gray world assumption theory methods, which detects five codes. Therefore, both the proposed image restoration method and thresholding method effectively improve the detection rate of ArUco codes.
As shown in Figure 14, Cao's method [46] exhibits a fast convergence rate during the initial stages of training, but it also shows significant oscillations during the convergence process. In contrast, the proposed method in this paper has a slower convergence rate but demonstrates a stable convergence process, with the final loss value consistently reaching a smaller value. The minimum testing error achieved by Cao's method [46] is 0.3828 pixels, while the proposed method in this paper achieves a minimum training error of 0.0036 pixels. Figure 15b presents the difference between the predicted underwater target position and the actual underwater target position. The predicted underwater keypoints and the actual underwater keypoints overlap almost perfectly. The errors are shown in Table 2, with a maximum error of 5.21 pixels, which meets the accuracy requirements for alignment. Table 3 shows the average processing time per frame under different scaling ratios. Figure 16 shows the data fluctuation under different scaling ratios. The algorithm proposed in this paper has the same processing results as no scaling detection, the data fluctuation is minimal, but the processing time is only 37.41% of the original time. It can meet the real-time requirements in the alignment control process.
The displacement changes of each hydraulic cylinder during the alignment process are shown in Figures 17-19. Finally, the hydraulic cylinder controls the piston rod to move up by 119 mm, indicating that the piston rod has successfully inserted into the hole on the UUV. As shown in Figures 20 and 21, the error between the final alignment position and the target position in the x direction is −0.69661 pixels, and in the y direction it is −0.58738 pixels. Therefore, considering the calibration errors in Figure 14, the underwater coordinate transformation errors in Table 2, and the motion errors in Figures 20 and 21, the maximum alignment error is 4.517 pixels. Based on the calculation of FOV using Equation (12), the maximum alignment error is 0.5548 mm, which meets the accuracy requirements.
Although the method proposed in this paper has successfully achieved the automatic alignment of the UUV underwater charging platform, there is still room for improvement. Firstly, for partially occluded targets, this paper uses redundant information processing methods in both the decoding and target recognition processes. However, accurately completing the occluded parts in real time would further enhance the robustness of the method. Secondly, in determining the camera's intrinsic and extrinsic parameters, this paper still needs to use Zhang's calibration method to determine the distortion coefficients and initialize the network parameters using the calibrated intrinsic parameters. This method is not concise enough. Finally, this paper did not explicitly express the camera's intrinsic and extrinsic parameters. These three points will be our future research directions.

Conclusions
This paper presents an automatic alignment method for UUV underwater charging platforms using monocular vision recognition. This method accurately identifies the UUV's identity information and guides the charging stake to smoothly insert into the charging port of the UUV through target recognition. To ensure the accuracy and robustness of decoding, this study introduces an encoding method based on redundant information and proposes an ArUco code reconstruction method specifically designed for underwater imaging environments for decoding purposes. Additionally, a method for determining the target position is proposed to overcome the difficulty of directly determining the underwater target position. The proposed method accurately determines the underwater two-dimensional coordinates of the target keypoints based on the location of the UUV target spray using deep learning and the law of refraction. The experimental results demonstrate that the proposed ArUco code reconstruction method can improve the detection rate of ArUco codes by at least 22.2%. The proposed target detection algorithm has an average processing time of 0.092 s per frame, meeting the requirements for real-time control. The maximum alignment error is 0.5548 mm, meeting the accuracy requirements for alignment.

Conflicts of Interest:
The authors declare no conflict of interest.