1. Introduction
Forward-looking imaging sonar, also called an acoustic camera, is an active sonar system that uses an array of transducers to generate acoustic waves at approximately 1 MHZ. These acoustic waves are focused and radiated within a specific range of altitude and azimuth, creating a two-dimensional image based on the intensity of the received echoes. This sensor is widely used in various underwater applications, as it can produce two-dimensional imaging results for underwater environments similar to those from an optical camera.
Acoustic cameras are one of the few imaging sensors that can be used in underwater environments and have attracted significant research attention in underwater robotics applications. However, underwater robotics research using acoustic cameras has primarily been conducted by a few number of institutions and universities due to the high cost of robotic platforms and acoustic cameras, the difficulty of accessing real sea environments, and the expense of building or using test tanks. For this reason, there has been increasing interest in developing simulators for acoustic cameras.
Simulating the signals of acoustic sensors typically requires modeling both the propagation of acoustic waves and the effects of various environmental factors, making it a complex and computationally intensive process. Fortunately, acoustic cameras have a relatively low signal-to-noise ratio and short detection range compared to other acoustic sensors, allowing certain processes in acoustic image simulation to be simplified. Accordingly, an acoustic camera simulator developed with these considerations in mind would be highly efficient and could be effectively utilized in underwater robotics applications.
With advancements in sonar technology, underwater sonar devices are now used in a variety of marine robotics applications, including target recognition, localization, mapping, and simultaneous localization and mapping (SLAM). Consequently, several sonar simulators have been developed for academic research purposes, and these can be categorized into three main types of simulation techniques for sonar imaging [
1].
Frequency domain models use the Fourier transform of transmitted and received sound pulses [
2]. Finite difference models solve the acoustic wave equation [
3], numerically determining acoustic pressure, although this approach is computationally intensive. Ray tracing is a rendering technique that can produce high-quality images but requires substantial computational resources due to the complexity of tracing light paths.
Some techniques were presented for simulating side-scan sonar images [
4,
5]. This technique is based on ray tracing, allowing it to generate realistic sonar data, but at a high computational cost. The method accounts for transducer motion and directivity characteristics, the refractive effects of seawater, and scattering from the seabed to produce synthetic side-scan images. Given the computational demands of ray tracing, some efforts have been made to find alternatives. One such alternative, called tube tracing, was proposed by [
6]. Tube tracing uses multiple rays to form a footprint on a detected boundary, requiring less computation than ray tracing. The effectiveness of tube tracing for forward-looking sonar simulations was demonstrated in [
7], where object irregularities and reverberation effects are also considered.
A forward-looking sonar simulator based on ray tracing was developed [
8]. This simulator uses distance information to the point where each ray intersects an object to generate a sonar image, though it does not incorporate an acoustic wave propagation model, resulting in relatively simple sonar images. Another approach to forward-looking imaging sonar simulation involves using incidence angle information at ray-object intersection points [
9]. Ray tracing is combined with a frequency domain method [
1].
A forward-looking sonar simulator was developed using the Gazebo simulator and the Robot Operating System (ROS) [
10,
11]. The simulator generates a point cloud through ray tracing in the Gazebo environment, which is then converted into a sonar image. To produce realistic sonar images, this approach considers object reflectivity as signal strength and applies various image processing techniques.
A GPU-based sonar simulation was proposed in [
12], capable of producing images for two types of acoustic cameras: forward-looking sonar and mechanically scanned imaging sonar. This simulator uses ray distance and incidence angle information, but does not account for wave propagation and reverberation effects. A sonar simulator that incorporates the physical properties of acoustic waves was proposed in [
13], treating acoustic wave propagation as an active sonar model and computing the received echo-to-noise ratio based on transmitted signal level, transmission loss, noise level, sensor directivity, and target strength.
A sonar simulator that combines ray tracing with style transfer based on a Generative Adversarial Network (GAN) was proposed in [
14]. This simulator generates the sonar image using ray tracing, and then a GAN-based style transfer produces realistic sonar images from the simulation images. A point-based scattering model was introduced to simulate the interaction of sound waves with targets and their surrounding environment [
15]. While simplifying the complexity of target scattering, the model successfully generates coherent image speckle and replicates the point spread function. This approach, implemented within the Gazebo simulator, also demonstrates that GPU processing can significantly improve the image refresh rate.
The simulators introduced above have primarily focused on the generation of realistic acoustic images, which often requires sophisticated acoustic modeling and GPU-based computations. Fortunately, acoustic camera simulators designed for this purpose can be increasingly important tools in simulations for underwater robotics applications. Meanwhile, since the simulation results of acoustic images can be utilized in underwater robotic applications, such as underwater localization and manipulation, the development of an efficient acoustic camera simulator capable of running on the onboard computing devices of mobile robots has also become essential. Therefore, an acoustic camera designed for this purpose should be able to operate even on low-spec computing devices and should avoid using GPU processing that consumes a significant amount of power. In response to this need, this paper proposes a simulator that can efficiently generate acoustic images without GPU-based computation except for scene rendering.
Specifically, the proposed simulator approximates acoustic beam as a set of acoustic rays using ray-casting engine efficiently. Then, an error model for the acoustic rays is employed to generate realistic acoustic images. This paper is organized as follows.
Section 2 describes the imaging geometry of acoustic cameras.
Section 3 and
Section 4 presents the methods for simulating acoustic images.
Section 5 and
Section 6 discuss the results of extensive experiments conducted to evaluate the effectiveness and performance of the developed simulator, followed by a discussion of potential future directions. Finally,
Section 7 presents the conclusions.
2. Imaging Geometry of Acoustic Camera
An acoustic camera emits sound waves in a specific beamforming pattern, determined by the wave frequency and an array of transducers. The beamforming directs acoustic waves to concentrate within a specified range of azimuth and elevation angles. Consequently, the space through which the acoustic waves propagate can be approximated as a fan-shaped region with specific azimuth and elevation angles and a maximum detectable range.
The acoustic camera can detect a large area at once using its transducer array; however, it can not determine the height from which the acoustic waves is reflected. The only information available is the range and azimuth of the point where the acoustic waves are reflected.
Following the notation in [
16], a 3-D point
can be represented in two coordinate systems: rectangular coordinates,
, and spherical coordinates,
, in the object or world coordinate systems. The point is also considered as a point where an acoustic wave is reflected (see
Figure 1).
Let
denote the coordinates of a 3-D point in the acoustic camera coordinate system. The transformation between rectangular and spherical coordinates is as follows:
The transformation matrix , consisting of the rotation matrix and translation vector , represents the transformation between the world (or object) coordinate system and the acoustic camera coordinate system.
In the acoustic camera coordinate system, a point on the image plane that is coplanar with the plane formed by the
x- and
y-axes of the acoustic camera coordinate system is determined solely by the range and azimuth angle (see
Figure 2), where
is the maximum elevation angle at which acoustic waves can be radiated from the acoustic camera.
The acoustic image is represented in two different coordinate systems: one with symmetric units in meter and another with asymmetric units consisting of meter and azimuth angle. Let
and
denote the acoustic image in the symmetric coordinate system and the acoustic image in the asymmetric coordinate system, respectively. Thus, an image point on the acoustic image
can be expressed as
for the acoustic image
, or as
for the acoustic image
. The transformation of two coordinates is given as follows:
where the discrete image coordinates in the two forms of the acoustic image are determined based on the resolution of each image.
4. Software Implementation
The simulation of acoustic images requires a rendering engine for ray-casting. For this purpose, we use the Gazebo simulator, which is widely utilized in robotics research and supports both scene rendering and a physics engine. Additionally, the simulator is integrated with the Robot Operating System (ROS), enabling easy deployment in robotic applications.
The acoustic image simulation involves the following steps: constructing a virtual environment with a 3-D model, generating a 3-D point cloud through ray-casting, calculating the echo intensity for each ray connecting the virtual camera position to individual points based on the sonar equation, and rendering a acoustic image using the imaging model.
The 3-D point clouds are generated using a depth camera plugin that leverages the ray-casting engine of the Gazebo simulator. A graphical user interface (GUI) is integrated with the simulator, allowing real-time monitoring of acoustic images and real-time control of all parameters related to the depth camera and sonar equations (see
Figure 4). Additionally, various types of noise can be applied to the acoustic images. The overall procedure is summarized in Algorithm 1.
Algorithm 1 Acoustic image simulation |
1 Input: | ▹Figure 1 |
Result: |
Parameter: | ▹Table 2, Table 3, Table 4 |
![Sensors 24 07835 i001]() |
5. Simulation Results
The performance of the developed simulator was evaluated using three types of models (see
Figure 5): a seabed represented as a smooth surface, a bridge pier modeled as a simple polyhedron, and an offshore plant jacket composed primarily of pipe structures.
The dimensions of the three models are as follows: the seabed model has 10 m in length, 10 m in width, and 2 m in height; the bridge pier model has 4 m in length, 4 m in width, and 9.5 m in height; and the jacket model has 5 m in length, 5 m in width, and 8.5 m in height. While the sizes and shapes of these models may differ from their real-world counterparts, this discrepancy does not affect the trends in simulation performance.
For each environmental model, acoustic images were generated from several specific locations (see
Figure 6,
Figure 7 and
Figure 8). The simulation parameters were based on the specifications of the Teledyne BlueView P900-45 acoustic camera (Teledyne Technologies, Thousand Oaks, CA, USA), as summarized in
Table 4. Since this acoustic camera operates at a relatively low frequency, the images produced by this sensor tend to have low resolution.
Each column in
Figure 6 shows a depth image and two types of acoustic images generated by the simulator. The images in each row were obtained from the same camera position in the simulator. In the third column, the images use the horizontal axis to represent azimuth angle and the vertical axis to represent range. The images in the second column align the vertical axis with the Y-axis and the horizontal axis with the X-axis in
Figure 1, with both axes measured in meters.
In the acoustic images, a higher pixel value indicates greater echo intensity returned from the corresponding azimuth and range. The simulation results for the seabed model in
Figure 6 show the contribution of the sonar equation. Consequently, the simulated images were able to represent echo intensity for terrain with significant elevation changes.
The simulation results for the bridge pier model in
Figure 7 demonstrate that the imaging model of the acoustic camera is accurately implemented. The simulated images represent the rectangular shape of the pier without distortion. Likewise, the simulation results for the jacket platform model in
Figure 8 show that the the simulator is capable of producing precise images even for complex structure like the jacket platform.
Table 4.
Parameters for simulating acoustic image of the Teledyne BlueView P900-45.
Table 4.
Parameters for simulating acoustic image of the Teledyne BlueView P900-45.
Parameter | Descriptions |
---|
Operating frequency | 900 kHz |
Operating range | 0.62 to 9 m |
Source level | 200 dB |
Water depth | 1 to 10 m |
Water temperature, salinity, pH | 15 °C, 0.5, 7 |
Target strength (TS) parameters | 90 (A), 4 (B), 100 (C), 1 (D), 90 (E), 8 (F) |
Depth image resolution | 1500 (W) × 1200 (H) |
Simulation image resolution | 106 (W) × 150 (H) |
6. Experimental Results
This experimental validation compared acoustic images obtained using the Teledyne BlueView P900-45 acoustic camera with simulated images generated by the acoustic camera simulator. The images were acquired from a model of a well platform, which represents a platform for subsea resource production where underwater robotic applications using acoustic cameras can be applied.
Acoustic images were captured in a test tank filled with fresh water, allowing for control of experimental conditions (see
Figure 9). The tank is rectangular in shape, with dimensions of 15 m in length, 10 m in width, and 2 m in depth. A scaled model of a subsea well was created and placed on the bottom of the test tank. The model is 1 m cube in shape, with a unique pattern on each side to mimic the characteristics of a subsea well.
One acoustic camera and two optical cameras were mounted together on a frame that can move forward, backward, left, and right at a fixed height. The viewing axes of the acoustic camera and one of the optical cameras are parallel to the bottom surface of the test tank. The remaining optical camera was positioned with its viewing axis perpendicular to the bottom of the tank. To determine the positions of the cameras within the tank, artificial visual markers were placed on the bottom and photographed.
The Gazebo simulator simulated the components and settings of the experimental environment where the actual images were acquired (see
Figure 10). The parameters for simulating the acoustic images were set based on the specifications of the actual acoustic camera, the Teledyne BlueView P900-45, summarized in
Table 5. The simulated images were acquired at the same location as the actual images, and no acoustic noise was applied.
Three sets of acoustic images were acquired from the test tank. The real acoustic images suffered from significant noise due to the enclosed environment of the test tank (see
Figure 11a and
Figure 12a). Consequently, multiple reflections of the acoustic waves were observed in the images, and shadow areas that are typically visible in high-performance acoustic cameras were partially identifiable. The simulated images clearly displayed the shadow areas and were generated at a rate of approximately 15 Hz, which is similar to the actual acoustic camera (see
Figure 11b and
Figure 12b).
The comparison between real and simulated acoustic images is explained by the overlapping images, target occupancy differences, and pixel intensity differences of the two images. Here, the target occupancy difference refers to the ratio of the region occupied by the object in the simulated image to that in the real acoustic image. To calculate the occupancy difference, the images to be compared are binarized into two parts: the object-occupied region and the background region. Typically, the object regions in the two images differ, primarily due to inaccuracies in image binarization and the CAD model.
The simulated images were overlaid with the real acoustic images for both representations of the acoustic images (see
Figure 11c and
Figure 12c). The region corresponding to the well platform showed a high level of image alignment. Significant alignment was also observed in the shadow areas created by the well platform. Although multiple reflections from the walls of the test tank contributed considerably to image misalignment, this effect can be disregarded as it is rare in open underwater environments.
The occupancy differences is evaluated using three metrics (see
Table 6):
,
, and
, where
and
represent the number of pixels occupied by the object region in the simulated image and in the real acoustic image, respectively (see
Figure 13).
is the number of pixels in the overlap region obtained by the bitwise AND operation of
and
. These three metrics approach a value of 1 as the two images become more similar.
For all data sets, the real and simulated images showed an occupancy error of approximately 10 to 25 percent. This difference can vary depending on the binarization parameters, so it should be interpreted as an indicator of general trends rather than an absolute measure of difference.
The intensity differences refers to the difference in the pixel value distribution between the two images. This metric indicates how similar the simulated image is to the pixel distribution of the real acoustic image, with a value closer to 0 representing a higher degree of similarity between the images. The two images showed a significant intensity error, primarily due to differences in the imaging modalities (see
Table 7). For example, the simulator did not sufficiently account for background noise in the acoustic image, and there was a discrepancy in the parameters related to the material properties of the object.