1. Introduction
The localization accuracy in autonomous underwater vehicles (AUVs) critically depends on reliable rotation estimation [
1,
2], which is a critical factor in solving the loop-closure problem. In underwater environments with limited visibility, forward-looking sonar (FLS) serves as an effective alternative [
3] because it can stably obtain two-dimensional (2D) images of the surrounding environment even under high-turbidity or low-illumination conditions. This capability makes FLS widely used in image-based navigation for underwater applications.
Recent studies on AUV motion estimation using sonar images can be broadly categorized into deep learning (DL)-based approaches and image registration-based approaches [
4,
5]. DL-based methods adapt neural network architectures originally designed for optical images to the characteristics of sonar imagery and estimate translation and rotation from consecutive FLS frames. Although these approaches can provide high estimation performance, their applicability to real environments is limited by difficulties in securing diverse training datasets, limited generalization to environmental variations in the field, and domain gaps between simulated and real sonar images.
To achieve generalizable performance across various seafloor environments, image registration-based methods are commonly adopted. Traditional feature-based approaches such as SIFT [
6] and SURF [
7] rely heavily on the extraction of distinct keypoints. However, FLS images exhibit low signal-to-noise ratios (SNRs) [
8], making the stable detection of edges or corners difficult and resulting in unreliable feature matching [
3,
9,
10,
11]. For this reason, recent feature-based registration studies have explored the use of KAZE [
12,
13], which detects robust keypoints using a nonlinear scale-space representation. Although KAZE-based methods have been applied to motion estimation in sonar imagery, their performance is constrained by the limited presence of distinctive features in seafloor scenes and by inherent sonar image characteristics, such as speckle noise, perceptual ambiguity, and shadows.
Underwater applications often employ a direct method to compensate for rotation estimation errors in feature-based methods by using image intensity information. The direct method does not rely on feature detection but instead applies the raw image data directly. This allows for a more robust rotation estimation even in sonar images, where precise feature extraction is challenging and noise levels are high. Representative direct-based techniques include the Fourier–Mellin transform (FMT) [
14] and direct-polar method [
15].
However, owing to the characteristics of sonar images, limitations in terms of robustness and accuracy remain. Rotation estimation continues to be a critical and challenging problem in sonar-based odometry because rotational errors have a greater influence on odometry estimation systems than longitudinal or lateral errors. Even minor initial errors can accumulate over time, leading to a severe drift. Therefore, reducing the uncertainty in rotation estimation is crucial for improving the overall reliability of odometry systems, underscoring the need for novel approaches to overcome these limitations.
To address these challenges, this study proposes a rotation estimation method based on the Radon transform. The Radon transform is a mathematical operation that projects a 2D image along multiple directions to compute its line integrals along the beams, from which a sinogram can be obtained [
16]. When an image rotates, its corresponding sinogram exhibits a horizontal shift, thereby allowing the rotation estimation problem between the two images to be reformulated as a shift estimation problem between their sinograms. Conventional Radon-based rotation estimation methods apply edge detection to the image prior to the Radon transform or estimate rotation angles via a correlation analysis of the sinograms. For example, Chelbi et al. applied the Radon transform to edge-detected images and calculated the rotation angle from the resulting sinograms [
17]. However, this approach is unsuitable for sonar images, where a low SNR makes clear edge detection difficult and consequently prevents accurate rotation estimation.
To overcome these limitations, this paper presents a rotation estimation method based on the Radon transform without feature extraction. The proposed pipeline comprises the following steps. First, an adaptive region of interest (ROI) was applied to remove unnecessary peripheral information, and the Radon transform was performed to generate a sinogram. Second, gamma correction and threshold-based preprocessing were conducted to emphasize projection values with strong directional components in the sinogram. Finally, the rotation angle was estimated by accumulating the projection values along the axis and calculating the shift between the maximum peaks of the two sinograms. The robustness of the proposed method was verified using tank experiments.
3. Related Work
Assorted studies have presented rotation estimation methods based on FLS images; however, extracting distinct feature points is challenging owing to the characteristics of sonar imagery. Thus, feature-based approaches have significant limitations when applied to sonar images.
Recent studies have increasingly adopted direct-based approaches to address these issues [
23,
24]. Representative direct-based rotation estimation methods include the direct-polar and FMT methods. This section introduces recent studies on feature-based approaches and direct-based rotation estimation methods for sonar images.
3.1. Feature-Based Method
Sonar images have inherent limitations in achieving stable feature detection because of strong speckle noise and low contrast. To assess these challenges, several studies have applied various feature detection algorithms, such as SIFT, SURF, FAST, ORB, BRISK, F-SIFT, SU-BRISK, and KAZE [
6,
7,
25,
26,
27,
28,
29,
30] to sonar imagery and compared their performance [
31]. These algorithms typically analyze intensity variations and structural boundaries to detect distinctive local regions, such as corners and edges. The resulting keypoints are then encoded as descriptors that remain invariant to rotation, scale, and illumination changes and are subsequently used to establish correspondences between image pairs.
In this process, the relative transformation between images is typically estimated by computing a homography or affine transformation matrix, while the Random Sample Consensus (RANSAC) algorithm is used to remove outliers and retain only reliable correspondences. However, owing to the uneven intensity distribution and low SNR of sonar images, these feature-based methods continue to face limitations in achieving accurate and reliable matching.
3.2. Direct Polar Method
Raw sonar images obtained from the FLS are typically represented in a format with range
r and azimuth angle
. The relative rotational displacement along the azimuthal direction can be estimated by applying phase correlation to a raw sonar image. The phase correlation method transforms the two input images into the frequency domain, computes the cross-power spectrum from the phase components of their complex Fourier spectra, and derives the correlation peak using the inverse Fourier transform. The distance (pixel displacement) between the image center and correlation peak represents the relative positional change between the two images, corresponding to the azimuthal rotation angle in the raw sonar image (
Figure 2) [
3].
This method offers advantages in terms of simplicity and computational efficiency; however, it is limited by pixel-level precision. For example, a representative high-resolution FLS device, such as DIDSON, produces images with a resolution of pixels, where 96 pixels in the vertical direction correspond to a field of view of approximately 29°, implying an angular resolution of approximately per pixel. Therefore, the minimum detectable rotation angle obtained using the phase correlation method is approximately , and smaller rotational variations may not be captured accurately. Hence, the lower bound of the rotation estimation error is determined by the pixel resolution, imposing a constraint on odometry applications that require precise attitude estimation.
3.3. Fourier–Mellin Transform
A fan-shaped structure is generated by transforming a raw sonar image. This type of image is represented in Cartesian form as
, and the FMT is one of the most widely applied approaches for estimating rotational changes in such images. The FMT converts the image into a log-polar coordinate system and performs phase correlation in the frequency domain, enabling the simultaneous estimation of the rotation and scale variations (
Figure 3). Owing to the difficulty of feature extraction from sonar images, the FMT has gained attention as a direct-based rotation estimation method.
However, the FMT involves multiple stages of transformation, which introduces certain limitations. The primary problem is interpolation errors [
32,
33,
34]. During polar coordinate transformation, the Cartesian image is reconstructed in the log-polar coordinate system, where strong edges or fine structures can become distorted or lost. Such distortions obscure the location of the correlation peak in the phase correlation results, thereby degrading the accuracy of the rotation estimation.
4. Method
The Radon transform projects an image linearly at multiple angles to compute the line integral in each direction, through which a sinogram can be generated. The sinogram visually represents the projection value as a function of the projection angle and is applied for orientation analysis and pattern detection in the image. With these characteristics, the Radon transform can analyze the rotation of objects in FLS images by analyzing the sinogram, even under low-SNR conditions, where reliable feature extraction is difficult.
In this study, a procedure was developed to estimate a Radon transform-based rotation suitable for sonar images (
Figure 4). First, an adaptive ROI was defined based on sonar image intensity information, and a sinogram was generated from this ROI. Next, a preprocessing step was applied to enhance the high projection value in the sinogram. Finally, the rotation angle was estimated by calculating the shift between the high projection values of the two sinograms. This section describes the overall processing procedure of the proposed method and its mathematical formulation in detail.
4.1. Adaptive Region of Interest Calculation
The FLS images contained speckle noise (
Figure 5a), making their direct use in processing stages (e.g., ROI selection) difficult. In this study, a block-averaging technique was applied to mitigate this problem. The image
is divided into non-overlapping square blocks of size
to reduce the speckle noise using blockwise averaging. Each pixel in a block is replaced by the mean intensity value of that block to produce a smoothed image
. The index sets
and
define the row and column ranges of the
th block, respectively, as expressed in Equations (
1) and (
2):
where the equations
and
denote the number of vertical and horizontal blocks, respectively. The entire image is partitioned into
non-overlapping rectangular regions
, as shown in Equation (
3):
The block-averaged image
is obtained by assigning the mean intensity of each block
to all the pixels in that block (
Figure 5b), as defined in Equation (
4):
Unlike a sliding window-based mean filter, this operation performs smoothing across the entire image block by block. Although some loss of fine structural details may occur, speckle noise can be effectively suppressed, facilitating threshold-based binarization and adaptive ROI extraction in subsequent processing stages.
The intensity distribution of the sonar image was analyzed to extract meaningful regions from the block-averaged image. In general, sonar images contain information-rich regions and areas with little helpful information, and this distribution varies with the altitude, tilt angle, and range of the sonar. In this study, instead of using external sensor information, regions with high intensity values were defined as potential object areas.
When an object is present in the sonar image, the emitted acoustic signal is reflected from the seafloor and object surface before being received. The reflection from the object surface is typically stronger than other reflections, forming a highlight region in the image, followed by a shadow region caused by acoustic occlusion (
Figure 6). Owing to these acoustic characteristics, regions with high pixel intensities are considered candidate areas where objects are likely to exist.
Therefore, in this study, regions with high intensity values in the sonar image were defined as potential object areas based on the highlight and shadow patterns and were designated as the ROI. Pixels with intensity values greater than or equal to the threshold
T were considered object candidates and binarized to generate a binary mask
representing the ROI (
Figure 5c). The mask is defined in Equation (
5):
where
T is empirically set to the mean of the top 20% intensity values in the sonar image because the target object protrudes from the seabed and therefore exhibits a clear intensity contrast with the background.
Subsequently, the center coordinates
and the area
A were calculated from the binary mask. The radius
r of the circular mask was defined based on the area (
Figure 5d), as defined in Equation (
6):
where
denotes an offset coefficient applied to expand the radius so that the actual ROI can be fully included. Finally, a circular mask is used to extract the ROI. As obtained from Equations (
7) and (
8), a circularly cropped ROI image
was generated (
Figure 5e), which was employed to generate the Radon transform-based sinogram:
4.2. Sinogram Generation via Radon Transform
This section details the process of converting sonar images into sinograms using the Radon transform and presents a preprocessing method for sinograms to improve the rotation estimation robustness. The Radon transform projects beams at multiple angles and computes the line integrals along these beams (
Figure 7). In this study, the Radon transform was applied to FLS images, which are characterized by a low SNR and a lack of distinct local features, to analyze directional information and estimate the rotation. The Radon transform integrates the pixel values of an image along a straight line at a specific projection angle. Given an input sonar image
, the Radon transform
is expressed by Equations (
9) and (
10).
where
represents the length of the normal passing through the origin,
denotes the projection angle, and
indicates the Dirac delta function, ensuring that only pixels lying on the line corresponding to a specific
and
contribute to the integral. The result of the Radon transform,
, is visualized as a 2D function on the
plane, referred to as a sinogram, where the horizontal and vertical axes correspond to the projection angle
and distance
, respectively.
The sinogram contains information on the projection direction of the image and can be applied to analyze rotational variations. For FLS images with a low SNR and indistinct feature points, the Radon transform was applied to the preprocessed ROI images, and the rotational offset between two frames was obtained by calculating the -axis shift between their sinograms.
The conventional Radon transform is performed on rectangular images, causing diamond-shaped artificial patterns to appear in the resulting sinogram,
(
Figure 8a). These artifacts, originating from the bright seafloor background within the square image, can introduce errors when estimating the rotation by computing the offset between the two sinograms. To address this problem, the proposed method employs a circularly cropped image,
. Applying the Radon transform as defined in Equation (
11) to this circular ROI generates a sinogram that minimizes undesirable high-intensity artifacts caused by rectangular image boundaries (
Figure 8b):
4.3. Sinogram Preprocessing
Although the sinogram
contains information on the projection direction, low contrast and speckle noise can weaken meaningful components. In this study, gamma correction was applied (
Figure 9) [
35] to selectively enhance regions with high projection values, which is a type of power-law transformation. A larger
value nonlinearly emphasizes high-intensity projection components, while suppressing low-intensity components. In this study, the gamma (
) value was experimentally determined as a hyperparameter. When
, it further enhances the projection regions (Equation (
12)) as follows:
After gamma correction, binarization was performed using a high threshold value (
), and the threshold
used in this study was experimentally determined as a hyperparameter. This process enabled the extraction of strong reflection patterns in a binary form (
Figure 10), as expressed in Equation (
13):
The binary mask contains only high-projection components and can minimize the influence of noise and achieve more robust rotation estimation when computing the relative offset between two sinograms.
4.4. Rotation Estimation
The rotation estimation process comprises the following steps. Based on the preprocessed sinogram obtained via gamma correction and thresholding, the projection value distribution for each projection angle
is computed by integrating along the distance axis
. Thereafter, the resulting distribution is used to estimate the rotation angle. The preprocessed sinograms corresponding to the two images are defined as
and
. These sinograms contain only high-projection components after gamma correction and binarization, thereby emphasizing the reflection energy at each
. Accumulating each sinogram along the
direction yields a 1D signal that represents the overall energy distribution according to the projection angle, as expressed in Equation (
14):
where
and
represent the total projection values at each angle
for the two images. This operation compresses the 2D sinogram into a 1D representation for each projection direction. The angle corresponding to the maximum value of each 1D signal is defined in Equation (
15):
The rotation between the two images can be estimated by obtaining the maximum values from Equation (
15). The rotational difference is calculated as the relative offset between the two values. When the sampling resolution of the sinogram is denoted by
k, the final estimated rotation angle is given by Equation (
16):
The Radon transform computes the line integral at a specific angle. Thus, accumulation over
represents the total reflected energy for that angle. Therefore, the location of the maximum peak in
is directly associated with the rotational orientation components in the image, and the difference in the peak positions between the two sinograms reflects the actual rotational offset (
Figure 11).
5. Experiment and Results
We conducted experiments in an indoor water tank environment to verify the performance of the proposed method. A turntable was installed at the bottom of the tank to apply a rotational motion, allowing the test object to rotate at the desired angles. We employed rectangular brick and arrow-shaped models as experimental objects to evaluate the generalizability of the proposed algorithm and its rotation estimation performance (
Figure 12).
This experiment employed an FLS of the DIDSON 300m model, with the tilt angle and detection range configured according to the conditions of the indoor water-tank environment. The tilt angle was adjusted to ensure that the objects on the tank floor were stably captured in the sonar image, and the detection range was set based on the object size and tank dimensions (
Table 1).
We collected and analyzed the FLS images of two experimental objects, a rectangular brick and an arrow-shaped model, under various rotation angles to verify the effectiveness of the proposed adaptive ROI estimation method.
Figure 13 and
Figure 14 show the three-stage results of the ROI estimation process for sequentially captured images taken from 0° to 40° at 10° intervals.
Each column represents the various rotation angles of the same object. Row (a) corresponds to the raw FLS image, row (b) shows the block-averaged intensity map, and row (c) represents the final estimated adaptive ROI. The results in row (b) show an image in which speckle noise is suppressed through block-averaging operations, and regions likely to contain the object are visually emphasized based on high-intensity areas. In this image, the central high-intensity region corresponds closely to the actual reflective structure of the object.
Row (c) represents the circular ROI generated based on the centroid extracted from the block-based intensity distribution. Thus, the ROI removes unnecessary regions in the Radon transform, while preserving the essential object shape in the overall image. This approach demonstrates that the proposed method can be generalized for object recognition and centroid extraction even under rotational variations.
Rotation estimation experiments were conducted based on the proposed adaptive ROI method using sinogram analysis for various rotation angles. We applied the Radon transform to the extracted adaptive ROI of each image to generate the sinograms. We then computed the vertically accumulated projection values. Rotation estimation was conducted by comparing the positional difference between the peaks of the maximum accumulated projection values in the sinograms of the base and rotated images. The angular shift at the peak position directly reflects the rotational change in the image, and the extent of the shift can be converted into the actual rotation angle according to the angular interval parameter in the Radon transform. In this experiment, we estimated the rotations at intervals of 0.1° during sinogram generation, resulting in a rotation resolution of 0.1°.
This work presents the experimental results of sinogram-based rotation estimation for a rectangular brick and arrow-shaped model.
Figure 15 and
Figure 16 depict the vertically accumulated projection values of the sinograms from the base and rotated images, respectively. Each subplot compares the results by rotation angle, ranging from 0° to 40° at 10° intervals. The horizontal axis represents the projection angle of the Radon transform, and the vertical axis indicates the total accumulated projection value for each projection angle.
As observed, the peak positions gradually shifted to the right as the rotation angle increased, which is consistent with the direction of the applied rotation. For both objects, the peak position displacement displayed a linear shift pattern as the rotation angle increased, indicating that the sinogram domain accurately reflected the rotational movement.
For the rectangular brick, multiple peaks appeared owing to its relatively wide reflective distribution, resulting in a slightly diffused peak pattern during rotation estimation. In contrast, the arrow-shaped model produced a narrow, concentrated reflection region projected as a sharp, single peak in the sinogram. Nevertheless, in both cases, the accumulated projection value moved linearly with the rotation angle, demonstrating that the proposed adaptive ROI-based rotation estimation method performed consistently across object structures.
Table 2 and
Table 3 show the performance comparison of rotation estimation algorithms at different rotation angles (10°, 20°, 30°, and 40°). The comparison includes the direct-based FMT method and classical feature-based approaches such as SIFT and SURF, as well as more recent feature-based techniques such as FAST and KAZE. The direct-polar method estimates rotation in the
domain and is applicable only to pure self-rotation of the AUV; therefore, it was excluded from the comparison. The proposed method showed stable performance for both the brick and arrow-shaped objects across all rotation angles. As shown in
Table 2, it estimated the rotations as 9.5°, 20.1°, 30.2°, and 41.9° for the applied angles of 10°, 20°, 30°, and 40°, resulting in a low average error of about 0.675°. In comparison, the FMT also succeeded at all angles but showed larger deviations of 4.78° at 20° and 1.08° at 30°, with an average error of 1.7175°. This variation is likely caused by interpolation errors from the log-polar transformation, which made its performance less consistent across angles. The performance of feature-based algorithms was much more limited. SURF was able to estimate rotations up to 30°, but failed at 40°, producing 37.15° at 30° with an error of more than 7°. SIFT completely failed at 30° and 40°, and FAST failed in all cases except for 10°. KAZE succeeded up to 30° but failed at 40°, and its estimates of 13.78° at 10° and 32.31° at 30° resulted in an average error of more than 3°, indicating relatively low accuracy.
A similar trend was observed in the experiment with the arrow-shaped model, as shown in
Table 3. The proposed method produced estimates of 9.3°, 17.1°, 28.1°, and 38.3° for the four rotation conditions, achieving the highest accuracy among all methods with an average error of 1.8°. Although the FMT successfully estimated all rotation angles, it exhibited a larger average error of 3.48°, indicating reduced accuracy. The feature-based algorithms also failed to estimate the rotations of the arrow-shaped model in most cases due to speckle noise and low contrast. SURF succeeded only at 10°, and SIFT and FAST also produced estimates only at 10° with very low accuracy. KAZE was able to estimate all four rotation angles, but its results of 9.4°, 15.96°, 26.38°, and 35.93° yielded an average error of 3.08°, which was comparable to that of the FMT.
Overall, the quantitative results confirm that the proposed Radon transform-based rotation estimation method achieved the lowest errors and the highest success rates across all objects and rotation conditions. In contrast, the feature-based algorithms failed in most cases because the low contrast and speckle noise prevented reliable keypoint extraction, while the FMT showed inconsistent performance due to interpolation-related limitations. These findings indicate that the proposed method is effective for estimating the rotation of static objects in FLS imagery.
6. Conclusions
This study proposes a rotation estimation method for scenarios in which an AUV moves along a circular trajectory around static seabed objects with distinct heights. Considering the characteristics of sonar images, which are highly susceptible to speckle noise, block averaging was applied to reduce noise, and an adaptive ROI was computed by identifying high-intensity regions likely corresponding to object areas. The extracted ROI removed unnecessary peripheral information, enabling the generation of a more robust sinogram. Rotation estimation was performed by calculating the shift between the maximum intensity peaks of the base and rotated images.
The proposed method was validated via tank experiments with two objects with distinct geometries (a rectangular brick and an arrow-shaped model), demonstrating consistent estimation results across rotation angles. In comparison, feature-based rotation estimation methods failed to extract reliable keypoints because of the low SNR of sonar images, leading to unsuccessful image registration and rotation estimation. Moreover, the FMT, a representative direct-based approach, estimated rotation but displayed reduced accuracy owing to interpolation errors during log-polar transformation.
Thus, the proposed method is an effective alternative for robustly estimating rotational information in environments where feature-based approaches are unreliable. However, owing to the characteristics of the Radon transform, the method has limitations when applied to non-rigid or dynamically moving objects. In future work, the proposed method will be applied to real seafloor sonar datasets and extended through integration with a rotation estimation-based SLAM frontend.