Commercial Optical and Acoustic Sensor Performances under Varying Turbidity, Illumination, and Target Distances

Acoustic and optical sensing modalities represent two of the primary sensing methods within underwater environments, and both have been researched extensively in previous works. Acoustic sensing is the premier method due to its high transmissivity in water and its relative immunity to environmental factors such as water clarity. Optical sensing is, however, valuable for many operational and inspection tasks and is readily understood by human operators. In this work, we quantify and compare the operational characteristics and environmental effects of turbidity and illumination on two commercial-off-the-shelf sensors and an additional augmented optical method, including: a high-frequency, forward-looking inspection sonar, a stereo camera with built-in stereo depth estimation, and color imaging, where a laser has been added for distance triangulation. The sensors have been compared in a controlled underwater environment with known target objects to ascertain quantitative operation performance, and it is shown that optical stereo depth estimation and laser triangulation operate satisfactorily at low and medium turbidites up to a distance of approximately one meter, with an error below 2 cm and 12 cm, respectively; acoustic measurements are almost completely unaffected up to two meters under high turbidity, with an error below 5 cm. Moreover, the stereo vision algorithm is slightly more robust than laser-line triangulation across turbidity and lighting conditions. Future work will concern the improvement of the stereo reconstruction and laser triangulation by algorithm enhancement and the fusion of the two sensing modalities.


Introduction
Just as in the case above water [1][2][3], a large variety of motivating applications and solution algorithms exist for the use of sensor information in many operational contexts, such as localization and inspection, including 2D/3D reconstruction of underwater objects and scenes [4]. Acoustic sensing is the premier sensing modality used in underwater environments due to the high speed of sound and low attenuation in water [5]. Simultaneously, many underwater sensing tasks such as inspection are advantageously performed using optical cameras because they deliver high sensing resolution and are easily interpreted by operators [6]. However, optical sensing is considerably affected by turbidity, attenuation, and lighting (both natural sunlight and artificial illumination), factors which do not significantly affect acoustic methods [7,8]. Hence, the sensing modalities have complementary advantages; combined sensing solutions lead to a robust solution which is often required for use in automated solutions, as noted in [9,10].
Given these complementary sensing effects, it is desirable to quantify the effects of environmental influences such as turbidity on sensing performance to elucidate the operational limitations for each sensing modality.
The contribution of this work is to reproduce and expand on previous works concerning the effect of environmental turbidity and lighting on target reconstruction by the precise control of target distance using a 3D servo-driven gantry; the recording of simultaneous stereo, color image, laser-triangulation, and acoustic imaging in the same controlled experiment; and quantitative evaluation of sensor noise and accuracy by conversion to real-world-unit point clouds for each sensor.
The hypotheses are that optical sensing accuracy will be negatively affected as a function of increasing turbidity; that an optimum illumination level that provides the best performance exists; and that it will break down when exceeding a certain turbidity and target distance; contrarily, the acoustic sensor should be negligibly affected by these environmental parameters.
The rest of the paper is organized as follows: firstly, related works are outlined; secondly, the materials and methods applied in the experiments are described, including the chosen commercial sensors and experimental facility; thirdly, the results from the sensor's raw measurements are evaluated for their operating limits and accuracy, with examples of measurements additionally illustrated; finally, the discussion summarizes the qualitative and quantitative behavior of the sensors.

Related Work
Previous investigations have focused on different objectives: for example, the reconstruction of undistorted and clear visual images from subsea images for the purposes of presentation to operators and the reconstruction of 2D/3D objects for the purposes of object detection, segmentation, classification, and structural damage detection [4] have been studied. Both qualitative and quantitative investigations of this nature have been performed in recent years.
In O'Byrne et al. [11], an image repository was created with various target objects under varying turbidites using a setup with two waterproof cameras to test stereo reconstruction algorithms. Some algorithms for 3D reconstruction and damage detection were demonstrated on this dataset in O'Byrne et al. [12][13][14]. Just as the case above water, structured light can be added to the scene to aid in reconstruction, demonstrated by Aykin et al. [15], Bruno et al. [16].
In Mai et al. [17], the fidelity was evaluated for high-frequency sonar, stereo vision, and time-of-flight (ToF) cameras of determining distance to and shape of a target object, with a focus on the comparison of sensor accuracy and noise. It was shown that stereo vision delivers the highest measurement fidelity, followed by the ToF camera; finally, sonar has the lowest measurement fidelity. A ToF camera was also investigated in Risholm et al. [18], wherein the camera used a range-gated strategy to successfully reduce backscatter from turbidity, in this case, to monitor fish in turbid environments.
An example of using optical and acoustic sensing modalities together is shown in Roman et al. [19], where a high-frequency sonar, stereo imaging, and a laser triangulation method were compared for archeological 3D measurements in the Aegean Sea. It was shown that the sensing modalities all provide useable fidelity in the given environment; however, turbidity and other environmental influences were not measured. In Yang et al. [20], the emphasis was on examining sharpness and color reproduction under varying turbidity and lighting conditions using a monocular color camera. A ColorChecker and SFR chart were used to estimate the image quality and color reproduction.
More recently, in Scott and Marburg [21], the quantitative effects of turbidity on various stereo reconstruction methods showed that stereo vision depth estimation is possible with usable robustness under low (17NTU) and medium (20NTU) turbidity conditions. Apart from inspection tasks, visual sensors can also be used for concurrent localization, such as those described in Concha et al. [22], where localization and dense mapping are demonstrated from a monocular camera sequence.

Materials and Methods
To perform the experiments, a commercial sensor was selected to embody each of the sensing modalities; then, these sensors were mounted in a rigid aluminum frame to fix the extrinsics between the sensors themselves and the target objects. First, we describe the selected sensors and their specifications; then, we describe the experimental setup, including the data acquisition and the selected target objects used in the performance evaluation; and finally, we describe the experimental procedure.

Sensors
For each sensing modality, a commercial-off-the-shelf (COTS) sensor was selected based on the maximum sensing distance which was used in the experiments, 2 m, while maintaining a high sensing fidelity under the given distance range. The stereo and color camera modalities were both embodied by the Intel D435i camera [23], and the acoustic modality was embodied by the BluePrint subsea M3000d sonar [24].

Stereo and Color Camera
A COTS stereo camera, the Intel D435i [23], embodied the optical sensing modality. This camera was chosen based on having a minimum 2 megapixel resolution color imager as well as on-board stereo imaging; in particular, it had built-in stereo depth estimation processing (to reduce the need for external computation in an end-use application). The stereo camera sensor specifications are given in Table 1. For the Intel D435i, the color imaging sensor was the OmniVision OV2740, while the stereo imaging sensors were OmniVision OV9282s. Since the stereo depth estimation is a built-in function of the camera, the main stereo-sensing specifications are listed in Table 2. The acoustic sensing modality was similarly embodied by a COTS forward-looking imaging sonar, the BluePrint subsea M3000d [24]; this sonar was selected for its small range resolution <1 cm and small angular resolution <1°with a suitable minimum distance of ≤0.1 m and a maximum distance of ≥1 m. The forward-looking imaging sonar sensing specifications are given in Table 3. For the purposes of this work, the sonar was used exclusively in the high-frequency mode shown on the right. Table 3. Oculus m3000d sonar manufacturer specifications, * indicates range-dependent specification. The laser specifications are given in Table 4. The laser was fitted with a line-generation lens immediately after the focusing lens and was mounted in a waterproof enclosure with a flat port acrylic window. The laser was focused at approximately 2 m and was mounted to be within view of the color camera at both the minimum and maximum test distances. The closest observable distance was determined by the intersection of the laser plane with the lower plane of the camera field-of-view (FOV), and the maximum distance was determined by the intersection with the upper plane of the FOV, as shown in Figure 1.

Turbidity Sensor
The turbidity sensor was an optical nephelometric sensor, model Aanderaa Turbidity Sensor 4296 [25]. The sensor was mounted to measure the turbidity in the forward direction towards the target into an unoccluded volume to avoid reflections from the pool's interior surfaces and the water surface. The turbidity sensor's main specifications are given in Table 5.

Experimental Setup
The experimental setup consisted of three overall parts: the sensors being tested, the test pool filled with test medium, and the test targets mounted on a 3D gantry (traverse). To ensure the extrinsics were fixed between the sensors, they were mounted on a rigid frame made of aluminum profiles, shown in Figure 2. The test pool was filled with tap water, and Kaolin [26] was used to control the turbidity. The test targets were mounted on a 3D gantry which allowed them to be moved with respect to the sensor frame, such that the distance between the target and the sensor reference planes could be varied. The complete experimental setup is shown in Figure 3. To prevent disturbances from external light sources, the experiments were conducted in a laser-safety-rated laboratory where external lighting could be reduced to near zero levels.

Target Objects
Two target objects were used during the experimental measurements: an ISO 12233:2017 edge spatial frequency response chart (eSFR chart) [27] used for image quality analysis-this eSFR target was printed in a 16:9 format and was printed with near-infrared and visible reflective inkjet technology-and a metal cylinder that resembles part of an offshore structure. The eSFR chart was glued to an aluminum sandwich backing plate and is shown in Figure 4a; the metal cylinder is shown in Figure 4b.

Data Acquisition
The data acquisition was performed using the Robot Operating System (ROS) Noetic built on Ubuntu 20.04, running on an NVIDIA Jetson Xavier NX, which is located within the stereo camera submersible enclosure. The Xavier NX was connected through serial communication (RS232) to the turbidity and conductivity sensors, by USB 3.1 to the stereo camera, and by gigabit ethernet to a switch outside the experimental tank. The Xavier NX and sensors were powered using power-over-ethernet (PoE) from the switch, apart from the sonar, which had a separate power supply and ethernet connection. The interconnection between the sensor components and the data capture equipment are shown in brief in Figure 5. See also Figure 2 for the physical layout of the sensors.

Experimental Parameters
The experiments were conducted at a set of turbidities, target distances, and illumination settings. Table 6a lists the desired and achieved turbidities for the experiment series, including the standard deviation as given by fluctuations in the turbidity sensor measurement. Table 6b lists the desired and achieved distances for the experiment series, including the measurement uncertainty; note that when transitioning to/from the far distances, the sensor frame was moved within the pool and the target distance was re-initialized using an external laser distance meter. The used lighting levels are shown in Table 6c.

Experimental Procedure
The experiments were conducted using a repetitive procedure which is also illustrated in Figure 6. The inner loop corresponds to light level variations; the intermediate loop corresponds to distance variations; and the outer loop corresponds to turbidity variations.
The procedure was designed to have the least experimental disturbances during variable changes since light changes cause no physical movements, whereas the control of Kaolin content is additive in nature.

1.
The sensor frame is placed within the test pool.

2.
The test target is reset, and the base distance is measured using a laser distance meter.

3.
Measurements are performed at each distance: (a) Measurements are performed at each light level: i. Light level is set at the selected percentage: see Table 6c ii.
The experiment is allowed to settle for 10 s. iii.
Sensor data are recorded in ROSbag format; then, point (i) is repeated.
The distance is changed by control of the gantry: see Table 6b; then, point (a) is repeated

4.
Kaolin is added until the desired turbidity is reached-see

Results
Using the ROSbags generated through the experiments, the performance of three sensing modalities has been evaluated: stereo depth estimation based on the built-in algorithm of the Intel camera-see Appendix A.3; laser triangulation implemented through the color camera and the MATLAB triangulation algorithm-see Appendix A.1; and the high-frequency imaging sonar-see Appendix A.2. For all modalities, the measurement accuracy has been analyzed through MATLAB, as described in the Appendix A.

Illumination Effects
The light level naturally influences the results for the optical methods, influencing both stereo depth estimation and laser-line triangulation. By review of the sensor measurements, it is evident that for both visual methods, the optimal light level in the experiments is 50%, with an example illustrated in Figure 7b. Less illumination, 25%, results in less clear features for stereo estimation and increased laser glare, shown in Figure 7a, while illumination levels of 75% to 100%, shown in Figure 7c,d, results in reduced contrast for the laser as well as increased backscatter, which reduces visual features in the resulting images.

Stereo Depth Estimation
For the stereo camera, the performance has been evaluated for a rectangular region of interest (ROI) in the central 20% of the depth image frame, as illustrated in Figure 8.
To determine the operational limits, the cut-off for valid distance measurements has been set at 50% valid pixels within the ROI, i.e., a pixel fill rate of >50% is considered as valid. The measurement accuracy as analyzed with Appendix A.3 is shown in Table 7 and Figures 9-11 while an example of the depth image is illustrated in Figure 8. Note how the background of the pool is still estimated at 0.3 FTU, Figure 8a, but begins to disappear at 2.1 FTU, Figure 8b, while the target remains valid in both cases. For the cylinder geometry estimation, the results show a very high deviation, which most likely stems from an insufficient quality of stereo intrinsic calibration, since it is evident that the eSFR plate behind the cylinder is also heavily distorted, as illustrated in Figure 12.

Laser Triangulation
The laser triangulation is performed by detecting the laser-line and projection as described in Appendix A.1, with examples shown on Figure 13 and results shown in Table 8 and Figures 14-16. The laser triangulation has an accuracy of a single centimeter up to a range of about 50 cm, increasing to an error of 3 cm at a range of 200 cm. The behavior of the deviation over distance seems to indicate some remaining uncompensated error in the camera intrinsics calibration since the error is non-monotonic with respect to the target distance. The sensing functions up to a distance of 103 cm for turbidities of ≤2.1 FTU-see The laser-line is naturally much easier to detect due to the improved contrast at low turbidities, which is evident from Figure 13a,b. For the cylindrical target, the geometric reproduction accuracy is shown in Figure 17, where the detected circle has a radius close to the actual value of 5 cm; the main outliers stem from the specular reflection along the long axis of the cylinder. The deviation is increased as the distance to the cylinder target is increased, as shown in Figure 17c. Overall, the fidelity of the geometric reproduction is satisfactory at close distances.

Acoustic (Sonar)
The sonar data have been processed using the program described in Appendix A.2, with examples shown on Figure 18 and results summarized in Table 9 and Figures 19-21. The sonar target object distances show excellent linearity >0.98% and consistent monotonic error for all turbidities. Of particular note in the resulting images is the specular acoustic artifact arising at close distances, which creates a radial high-intensity echo tangential to the plane of the target object. The cylindrical target information is illustrated with a binarized image in Figure 22, where it is clear that the cylinder is detected; however, there is a substantial amount of noise at the front and rear boundaries of the cylinder.

Discussion
In general, acoustic sensing is mostly stable across operating conditions; however, stereo vision and laser-line triangulation can also operate successfully under low and medium turbidity conditions: 0.3 FTU to 2.1 FTU at ranges of up to 100 cm. For laser triangulation, the accuracy is relatively constant in the range of 0.3 FTU to 2.1 FTU, with a total maximum mode deviation of 1.54 (0.90) cm at a range of 103 cm. Stereo depth estimation suffers from some non-linearity and increased deviation up to 36 cm-though, it is lower, ≤15 cm, at a distance below 140 cm; this most likely related to an insufficient quality of the intrinsic parameter calibration in particular, which warrants further work. At 6 FTU, the operating range is severely limited for the optical methods (laser-line and stereo depth estimation) but still usable up to distances ≤43 cm for laser distance measurements and up to ≤63 cm for stereo depth estimations. For all of the sensors, it is possible to detect and estimate the cylinder targets' geometry within 10% of the actual dimensions at distances closer than 63 cm. However, accuracy is substantially worse at longer distances. The operating depth for the considered approaches is generally limited by the manufacturer constraints for the commercial sensors as noted in their specifications; for the laser triangulation, the operating depth is additionally limited by the amount of ambient light; operating very close to the surface would not be possible. For the stereo camera, a large distortion still remains after the execution of the built-in calibration procedures: this would most likely need to be further corrected in a real application, depending on the particular application requirements. For acoustic sensing, other environmental parameters such as salinity or suspended particulate matter of large sizes may be more interesting to investigate since these are more likely to affect performance and operating limits with respect to the target distance. In summary, this work entails that these optical methods are usable even under relatively high turbidities if they are used for operations where only short-range measurements are needed; the useful operating range increases with decreasing turbidites, up until a maximum experimental distance of 200 cm. Contrarily, ranging using the acoustic sensor is, for the purpose of detecting used target objects under the given distances and environmental effects, unaffected, even at the highest turbidity and target distance.

Concluding Remarks
The experimental evaluation confirms the hypotheses that these optical methods provide great spatial details of the target objects and that increased turbidity affects their accuracy negatively. However, even at some substantial turbidity levels, i.e., 2.1 FTU, they still provide reasonable target object information at close ranges. Conversely, the sonar is not affected to a notable degree by turbidities of up to 6 FTU, but it provides the least amount of spatial information. In summary, this warrants investigation of sensor fusion where the complementary advantages of the different modalities can be fully exploited. Other future work includes the possible improvement of the laser-line distance measurement algorithm to improve the operating range and rejection of specular reflections. Alternatively, a modulated or rotational laser approach can also be investigated. Improvement of the stereo camera calibration to lower the distortion or external processing of the stereo camera information can be studied in order to ascertain whether larger operating conditions are achievable with other algorithms; this can be performed in extension to or completely replace the built-in stereo depth estimation. The addition of other environmental influences, such as salinity and suspended particulate matter, may lead to additional effects worth investigating, particularly for acoustic sensing.   For the sonar data, the sonar image was first binarized with a threshold of 0.3 (normalized). Then, the sonar points were projected into 2D coordinates, followed by projection into 3D. The 100 closest points within 10% of the ground truth target distance to the origin were found and used to calculate the mode and std.dev. of the target distance measurement, as shown in Listing A2.