Novel Descattering Approach for Stereo Vision in Dense Suspended Scatterer Environments

In this paper, we propose a model-based scattering removal method for stereo vision for robot manipulation in indoor scattering media where the commonly used ranging sensors are unable to work. Stereo vision is an inherently ill-posed and challenging problem. It is even more difficult in the case of images of dense fog or dense steam scenes illuminated by active light sources. Images taken in such environments suffer attenuation of object radiance and scattering of the active light sources. To solve this problem, we first derive the imaging model for images taken in a dense scattering medium with a single active illumination close to the cameras. Based on this physical model, the non-uniform backscattering signal is efficiently removed. The descattered images are then utilized as the input images of stereo vision. The performance of the method is evaluated based on the quality of the depth map from stereo vision. We also demonstrate the effectiveness of the proposed method by carrying out the real robot manipulation task.


Introduction
High spatial resolution ranging is crucial in robot manipulation and a depth map is necessary to accomplish the task. There are many cases where the system works in low visibility and strong scattering environments, such as underwater robots or firefighting robots. Our application is bipedal and quadrupedal robots working in nuclear power plants where they must cope with poor visibility due to dense steam. When an accident occurs, the plant is filled with very dense water-based atmospheric particles, and the robot needs to operate the plant. From our experiments, the commonly used sensors such as LiDAR (LMS511, SICK, Waldkirch, Germany and UTM-30LX-EW, Hokuyo, Osaka, Japan) and time of flight (ToF) camera (Kinect v2, Microsoft, Redmond, WA, USA) are unable to work in such low visibility conditions. Our conclusion is consistent with the study by Starr and Lattimer [1]. Some specialized subsea LiDAR systems (please refer to Massot-Campos and Oliver-Codina [2] for a comprehensive survey of underwater 3D reconstruction), laser line scanning [3] or structured light [4][5][6][7] are able to operate in scattering media. However, these systems are power-consuming, slow, and bulky and thus they are not well suited for a walking robot. Our goal is to utilize the images from a standard stereo vision system for robot manipulation in a scattering environment. Therefore, no additional hardware is required for the stereo vision system. Stereo vision has been intensively studied for decades since retrieving the depth map of a scene is critical in many applications such as driving assistance and automated robotics. However, most state-of-the-art methods of stereo vision primarily deal with high-quality images from datasets, for Several methods were introduced to solve stereo vision for images of fog or underwater scenes. Caraffa and Tarel [37] combine photo-consistency term and atmosphere veil depth cues to formulate the problem and solve stereo and defog by utilizing the α-expansion algorithm [18]. This method is sensitive to the nonlinear camera respond function and image noise. Therefore, the authors demonstrated proper results for synthetic images but not real foggy images. Roser et al. [38] iterate applying a conventional stereo algorithm to compute the depth and using depth to recover the object radiance. The method, however, does not model light scattering in the stereo matching step and defogs video frames independently, which cause errors in stereo matching. Li et al. [39] solve depth reconstruction and defog simultaneously from monocular video based on structure-from-motion (SfM). This only works when SfM can be calculated. Furthermore, the method is far from real-time capable since 10 min per frame is reported. These studies noted above are capable of processing images obtained under natural light sources only. Negahdaripour and Sarafraz in [40] use both photo-consistency and backscattering cues to estimate disparity by the local matching method. The method can be applied to images corrupted by backscattering, taken under a non-homogeneous artificial light source. The authors, however, assumed that the depth in the supported window is constant, which led to wrong estimation of the scattering signal at depth discontinuities, especially in a highly non-homogeneous scattering signal area.
In this study, we propose a scattering removal technique, called descattering, followed by a standard stereo method where we focus on how to remove the scattering efficiently for stereo vision. The imaging model is derived in Section 2. From the model, the model-based descattering method is proposed, where we try to remove the scattering effect. The intermediate resulting images of the descattering method are defogged utilizing the well-known DCP [29]. Both steps above are shown in Section 3. The results of stereo vision of dense scattering scene of both synthetic images and real experimental images are shown in Section 4. The robot system and the accomplishment of a robot manipulation task are demonstrated in Section 5. Finally, Section 6 presents our conclusions.

Imaging Model
Three underlying assumptions are used in this approach:

•
The illumination source is known and close to the cameras. This is feasible since the cameras and the light source are installed in the head of the robot.

•
The scattering is single scattering. Although multiple scattering occurs, it is proven that utilizing single scattering model is effective in scattering removal [24,40,41].

•
The input image I is given in the actual scene radiance values. The radiance maps can be recovered by inverting the acquisition response curve proposed by Debevec and Malik [42].

Single View Modeling in Scatterers Environment
Consider a vision system configuration in Figure 1. Let X = (X, Y, Z) and x = (x, y) be global coordinates of a point in space and its projection into image plan, respectively. R s (X) and R c (X) are the distances from a point in space X to the light source and the left camera, respectively. R c0 (x obj ) is the distance where the light field first intersects the line of sight (LOS), which is unique for every pixel x obj in the image. I s (X) is the irradiance of a point in space that illuminated by the point light source and θ is the backscattering angle. B is the baseline. The measured intensity can be modeled as a linear combination of attenuated radiance R(x obj ) (red line) (attenuated fraction of object radiance L obj (x obj )) and backscattering component S(x obj ) (blue line) as follows: I(x obj ) = R(x obj ) + S(x obj ). (1) Note that the single scattering is assumed and the image blur due to the forward scattering [41] is not taken into account. The attenuated signal is: where direct transmission is: where c is the attenuation coefficient (or extinction coefficient) of the environment due to absorption and scattering. The object radiance is given by: where ρ(x obj ) is the object reflectance. The irradiance of a point in space that illuminated by the point light source of intensity L s is: where Q(X) expresses the non-uniformity of the illumination source. The falloff 1 R 2 s (X) is caused by free space light propagation. Since the baseline of the illuminator-camera is very small compared to the object distance, R s (X) ≈ R c (X) = X . Substituting Equations (3)-(5) into Equation (2), we obtain: The total backscattering signal that the camera receives is: where b[θ(X)] is the phase function of backscattering. The LOS from the camera to object is: where f is the cameras' focal length.
To simplify the analysis, let us assume that b[θ(X LOS )] ≈ b is constant over the field of view, which is supported by [23,24], and that Q(X) ≈ Q(X obj ) is constant along the LOS, which is supported by small camera-illuminator baselines. If there are several sources, Equation (7) applies to each source. Accumulating the integral for all sources yields the total backscattering. Equation (7) becomes: Tribitz and Schechner [23] derived the analytic solution of the integral in Equation (9) as Equation (10) and its approximation as Equation (11): , X ∈ X LOS , (10) where S ∞ (x obj ) ∝ bL s Q(X obj ) (considering Equation (10)) denotes the saturated backscattering value. It is worth noting that the non-uniformity of S ∞ (x obj ) is attributed to the anisotropic pattern Q(X obj ) in this special case. The constant parameter k depends on R c0 (x obj ), c and b. From Equation (11), the rate at which S(x i obj ) increases with X i obj is set by parameter k. Since the baseline of illuminator-camera is very small compared to the object distance, in widefield lighting, we have R c0 (x obj ) R c (X obj ), thus R c0 (x obj ) ≈ 0. Substituting Equations (6) and (11) into Equation (1), noting that R c (X obj ) = X obj , the image's intensity becomes: Equation (12) resembles Koschmieder's law, which models daytime outdoor fog. The major difference is that in our case S ∞ (x obj ) is spatial variant.

Stereo Modeling in Suspended Scatterer Environment
In a stereo vision system, using Equations (4) and (5), Equation (12) becomes: where X i obj , i = L, R are the coordinates of a point X obj in space with respect to left and right cameras, respectively. Noting that the global coordinates and left camera coordinates are the same (X L obj = X obj ), we have the relationship: In Equation (13), the image coordinates of a point in space projected into the rectified left and right images are x i obj , i = L, R. We also have: where d(x L obj ) is the disparity map that pairs up corresponding pixels x L obj , x R obj . In general, the system setup is more complicated than what was derived in Section 2.1: lighting geometry cannot be ignored, and there are several sources. In such cases, Tribitz and Schechner [23,24] show that the backscatter still follows the approximated model in Equation (11). However, S i ∞ (x i obj ) depends not only on the anisotropic pattern Q(X i obj ) of the light source and the scattering parameters c and b, but also on the lighting geometry. The smaller camera-illuminator baseline is, the stronger the non-uniformity is. Equation (5) shows that the LOS that is closer to the light source receives stronger backscattering signal. The reason is that the irradiance I s (X) is very strong where the light field first meet the LOS. Thus, the two cameras sense a different backscattering signal, depending on their geometric relationships to the light. That makes the stereo vision in scattering media more problematic because the intensity of the same object can be significantly different. Figure 2a-c show the stereo pair of a clear scene, a foggy scene with natural light, and a foggy scene with an artificial light source, respectively. In the first example, since the images are taken in a clean environment, texture and contrast are preserved. Therefore, these images can be directly processed by utilizing conventional well-developed stereo vision algorithms. Figure 2b depicts the synthetic stereo images of the scene shown in Figure 2a in the presence of fog under natural light. In this case, the imaging model obeys Koschmieder's law [25]. Due to attenuation, the greater the distance the signal propagates over, the weaker the object radiance that the cameras receive is. Thus, the contrast of these objects (inside the yellow rectangle) is low. Additionally, since the natural light is assumed to be parallel and uniform, the cameras capture the scattering signal, which depends on the air attenuation coefficient and the object distance. Although there are some difficulties in obtaining a depth map from these images due to poor contrast, the photo-consistency is held. Figure 2c represents an even more complicated case: the synthetic images of a scene under a foggy condition illuminated by an artificial light source that is installed under the two cameras. Besides suffering from poor contrast due to attenuation, the light adds a different scattering signal to the cameras, depending on lighting geometry. Consequently, the brightness of one object in two images (inside red rectangle) is not identical. Thus, the photo-consistency does not hold. obtaining a depth map from these images due to poor contrast, the photo-consistency is held. Figure 2c represents an even more complicated case: the synthetic images of a scene under a foggy condition illuminated by an artificial light source that is installed under the two cameras. Besides suffering from poor contrast due to attenuation, the light adds a different scattering signal to the cameras, depending on lighting geometry. Consequently, the brightness of one object in two images (inside red rectangle) is not identical. Thus, the photo-consistency does not hold.

Light Compensation
The first step is light compensation, which removes the non-uniformity of the backscattering. To do this, the measured image, modeled in Equation (13), is divided by the saturated backscattering signal, we obtain a light compensated image: where the distorted radiance of object is defined as: where obj ( obj ) S ∞ ( obj ) ⁄ is a spatial-varying value. It depends on the geometric configuration of the light. This means that applying the light compensation step results in many local radiometric differences in the object signal. However, it will be compensated after defogging. The light compensated image in Equation (16)

Light Compensation
The first step is light compensation, which removes the non-uniformity of the backscattering. To do this, the measured image, modeled in Equation (13), is divided by the saturated backscattering signal, we obtain a light compensated image: where the distorted radiance of object is defined as: is a spatial-varying value. It depends on the geometric configuration of the light. This means that applying the light compensation step results in many local radiometric differences in the object signal. However, it will be compensated after defogging. The light compensated image in Equation (16) is similar to Koschmieder's law with the airlight equal 1. Let us Noting thatL d,i obj (x i obj ) in Equation (17) is neither the reflectivity nor radiance of the object. It, however, is an enhanced image, with radiometric distortion, from the original corrupted image by strong backscattering.

Saturated Backscattering Estimation
Based on (11), saturated backscattering can be easily estimated: Thus, saturated backscattering can be pre-calibrated by taking the void images where there is no object in the images ( X i obj → ∞ ). However, in our experiment, due to space limitation, we took pictures of very dense steam and fog (e.g., c ≈ 2.3 m −1 ) scenes where no object can be seen. As derived in Section 2.2, saturated backscattering depends on the attenuation coefficient c. However, from Equation (10), we can obtain the following relationship: where k c < 1 is the constant gain, which depends on the attenuation coefficient. The constancy of k c at the specific attenuation coefficient was confirmed by our experiment, for example, k c=1.5 = 0.86 ± 0.05. Figure 3 illustrates the saturated backscattering signal of two different system configurations. The images are the original images without any color correction. In the first setup, the light was put under the cameras, and steam was generated by the steam generator using pure water. The light was placed above the cameras in the second setup, and the fog was produced by a fog machine using oil.

Saturated Backscattering Estimation
Based on (11), saturated backscattering can be easily estimated: Thus, saturated backscattering can be pre-calibrated by taking the void images where there is no object in the images (‖ obj ‖ → ∞).
However, in our experiment, due to space limitation, we took pictures of very dense steam and fog (e.g. ≈ 2.3 m −1 ) scenes where no object can be seen. As derived in Section 2.2, saturated backscattering depends on the attenuation coefficient . However, from Equation (10), we can obtain the following relationship: where < 1 is the constant gain, which depends on the attenuation coefficient. The constancy of at the specific attenuation coefficient was confirmed by our experiment, for example, =1.5 = 0.86 ± 0.05. Figure 3 illustrates the saturated backscattering signal of two different system configurations. The images are the original images without any color correction. In the first setup, the light was put under the cameras, and steam was generated by the steam generator using pure water. The light was placed above the cameras in the second setup, and the fog was produced by a fog machine using oil.

DCP-Based Defogging
DCP [29] is employed to remove the fog, a process called defogging, of the light compensated image in Equation (16). Let us summarize the DCP proposed by He et al. [29]. The dark channel of the light compensated image is defined as: where Ω( 0 ) is the local patch centered at x 0 . The patch transmission is then calculated as:

DCP-Based Defogging
DCP [29] is employed to remove the fog, a process called defogging, of the light compensated image in Equation (16). Let us summarize the DCP proposed by He et al. [29]. The dark channel of the light compensated image is defined as: where Ω(x 0 ) is the local patch centered at x 0 . The patch transmission is then calculated as: Different from the original DCP method, we employ guided image filtering [17] to refine the raw transmission map in Equation (21) in order to obtain τ k (x). The distorted object radiance can be obtained by inverting Equation (16): Transmission can be very close to zero; thus, it is restricted to the lower bound τ 0 . There is radiometric distortion in the distorted object radiance, as shown in Equation (17). Therefore, to preserve the photo-consistency in the left and right images, the radiometric distortion must be eliminated. This can be done easily by multiplying the distorted object radiance by saturated backscattering S i ∞ (x i obj ) to obtain the modified object radiance as follows: It is also worth noting thatL i obj (x i obj ) is not the original radiance of the object. The value e −(c−k) X i obj either attenuates (when −(c − k) < 0) or amplifies (when −(c − k) > 0) the original object radiance. However, in our experiment, the modified radiance images are useful for both reviewing the scene and reconstructing depth map. For simplicity, we callL i obj (x i obj ) a defogged image in our paper.
DCP was designed for natural images. The assumption may not hold for indoor human-made scenes. The main reason is that DCP can detect the specular reflection [43]. By utilizing the active polarization system [24] (explained in Appendix A), the specular reflection can be removed; thus, we verify that the DCP works properly in our system.

Normalization-Based Image Correction
From our observation, when the fog is very dense and uniform, the modified direct transmission τ k (x obj ) is almost a constant, which is very small; thus, the backscatter S i (x obj ) is close to its saturation S i ∞ (x obj ). Consequently, the minimum intensity of the light compensated image is set by atmospheric veil 1 − τ k (x obj ) . Therefore, by normalizing the light compensated imageĴ i (x i obj ), we can efficiently both remove the atmospheric veil and scaleL d,i The normalization image is defined as follows: Sensors 2017, 17, 1425 The image is an approximation ofL d,i obj (x i obj )e −k X i obj in Equation (16). Then, to remove radiometric distortion, we define the compensated normalization image as: Only scattering removal is involved in this method. The attenuation was not removed in this image; thus, it still suffers from poor contrast. From that physical meaning, we callR i n (x i obj ) a descattered image. However, we will show in Section 4 that this method is feasible for stereo vision in uniform steam environments. However, it fails in the case of non-uniform steam. Figure 4 shows our descattered and defogged results. The first row shows images when the fog is uniform while the second row depicts the images in the case of non-uniform fog. The image in Figure 4a was taken in a very dense fog environment (c = 1.6 m −1 ) associated with lighting setup 2 in Figure 3. In Figure 4b, the light compensated imageĴ i (x i obj ) of the input image is illustrated, which were scaled into [0,1] for visualization. Figure 4c,d show descattered and defogged results from the proposed method, respectively. Figure 4e,f are nighttime dehazed results of Zhang et al. [35] and Li et al. [36], respectively. In the case of uniform fog, it can be seen that both descattered and defogged images from our method are better than that of [35,36].


In the first setting, the stereo baseline is 10 cm. The light is put under the cameras. The light source and cameras are not coaxial. The experiment was conducted in a booth with dimensions of 3 × 1.5 × 1.6 m 3 . We utilized a steam generator to generate the steam using pure water inside the cabin. The generated steam's temperature is 100-120 °C. Our system is able to produce steam as dense as an attenuation coefficient of 1.15 m −1 .  In the second setup, the stereo vision is the same as the previous configuration. However, the light source is placed above the cameras and coaxial to cameras. This experiment was done in a room with dimensions of 6 × 4 × 2.5 m 3 . To generate fog in such a big room, we utilized a fog machine (CHAMP-1500W, Joongang Special Lights, Seoul, Korea) that uses oil.
We make use of visibility to estimate the steam and fog density. The visibility is a measure of distance at which an object can be clearly discerned from the background. Visibility V is calculated as: where is a constant depending on contrast ratios. Contrast ratios are between 0.018 and 0.03. A contrast ratio of 0.02 is usually used to calculate the visual range; thus, = 3.912. The attenuation coefficient is calculated as follows: where L is the distance that the light travels from the source to the receiver. 0 and are the intensity measured when light travels in the clear condition and the foggy condition, respectively. To measure The method of [35] is incapable of removing image glow whereas in the result of [36], the dark area becomes very dark. In the case of non-uniform fog, our defogged method and the method in [36] show better ability of non-uniform fog removal. The result of [36], however, still makes the dark area become darker.

•
In the first setting, the stereo baseline is 10 cm. The light is put under the cameras. The light source and cameras are not coaxial. The experiment was conducted in a booth with dimensions of 3 × 1.5 × 1.6 m 3 . We utilized a steam generator to generate the steam using pure water inside the cabin. The generated steam's temperature is 100-120 • C. Our system is able to produce steam as dense as an attenuation coefficient of 1.15 m −1 .

•
In the second setup, the stereo vision is the same as the previous configuration. However, the light source is placed above the cameras and coaxial to cameras. This experiment was done in a room with dimensions of 6 × 4 × 2.5 m 3 . To generate fog in such a big room, we utilized a fog machine (CHAMP-1500W, Joongang Special Lights, Seoul, Korea) that uses oil.
We make use of visibility to estimate the steam and fog density. The visibility is a measure of distance at which an object can be clearly discerned from the background. Visibility V is calculated as: where C is a constant depending on contrast ratios. Contrast ratios are between 0.018 and 0.03. A contrast ratio of 0.02 is usually used to calculate the visual range; thus, C = 3.912. The attenuation coefficient is calculated as follows: where L is the distance that the light travels from the source to the receiver. I 0 and I are the intensity measured when light travels in the clear condition and the foggy condition, respectively. To measure the attenuation coefficient c and then visibility V, a HeNe laser (wavelength of 632.8 nm and power of 0.8 mW) and a photodiode sensor (S120C), both from Thorlabs, Newton, NJ, USA, were employed as an emitter and receiver, respectively. It should be noted that the attenuation coefficient c is wavelength-dependent. The longer the wavelength is, the higher the attenuated coefficient c is.

Stereo Results from Synthetic Images
Twelve datasets (Middlebury 2014 stereo datasets) from [9] were selected and used to generate synthetic data. The images were resized by half. We created synthetic images based on our imaging model derived in Section 2 with the provided ground truth disparity map. We normalized and scaled the ground truth depth map into a range from 0.5 m to 2.5 m. In the attenuated signal term, the non-uniformity of the illumination source is negligible. Only the attenuation of object radiance (from the original images) is considered. A backscattering signal is added to images based on our real pre-calibrated saturated backscattering signal S i ∞ (x i obj ). The criteria to evaluate the quality of the disparity map from the synthetic image is the percentage of good matching pixels [8]. The threshold value of one was used. If the difference between the estimated disparity and the ground truth is larger than one, the pixel is considered to be a bad pixel. Otherwise, it is a good pixel.
We found that our descattered imagesR i n (x i obj ), derived in Section 3.2.2, without DCP-based defogging provide a better stereo result in the case of dense uniform steam. However, for images of non-uniform steam scenes, the defogged imagesL i obj (x i obj ), derived in Section 3.2.1, work better. The reason is that the defogging algorithm based on DCP is based on statistics; thus, the estimation of transmission may not be accurate. Therefore, the color, which is very sensitive to the transmission map, in left and right images is less similar after defogging, which causes wrong matching. The descattered image, on the other hand, is very close to the modified object radiance. The reason is that the modified transmission τ k (x obj ) is almost constant and close to 1 in a dense scatterer environment. In the case of non-uniform fog or steam, because c and k are spatial-varying, the above assumption does not hold. In this case, DCP based defogging can remove the non-uniformity of the fog in the image; thus, the stereo vision quality of defogged images is better than that of descattered images. This will be proven in both synthetic images in this section and real images in the next section.
Semi-global matching (SGM) [44] was employed as a stereo vision algorithm in our real robot manipulation task. Table 1 shows a comparison of the disparity map quality between the descattered and defogged images of two kinds of conditions, namely, uniform steam (V = 3 m) and non-uniform steam (V ∈ [3, 4] m). When dealing with images corrupted by uniform dense steam, descattered images are about 10% better than defogged images. In the case of non-uniform steam, defogged images, however, provide a 7% better result. Thus, the choice of making use of descattered images or defogged images depends on whether the environment is uniform.
For evaluation, we compared the disparity map from our descattering and defogging method with those of backscatter-corrupted images, Negahdaripour and Sarafraz [40], Zhang et al. [35], and Li et al. [36]. The method in [40] improves stereo matching by incorporating backscattering cues. This method is a local matching method and can obtain the depth map directly. The authors utilized Normalized Sum of Square Difference (NSSD) with the mean subtraction function as the matching cost. The nighttime dehazing methods in [35,36] can improve the visibility of a hazed image of a scene illuminated by active light sources. We implemented the method in [40] and ours using Matlab, while the authors of [35,36] provided their software run in C and Matlab, respectively. We can freely choose the stereo algorithm to process our descattered and defogged images. However, since the method in [40] is based on NSSD, we treat the other images in the stereo vision step using the same matching cost function for a fair comparison. It should be noted that in our robot manipulation, we employed SGM.  Table 2 illustrates the summarized comparison of the stereo vision results using NSSD in three conditions, namely, lighting setups 1 and 2 with uniform fog, and lighting setup 1 with non-uniform fog. The data are the average correct rate of the 12 datasets. In the case of uniform fog, our descattered images were used for stereo vision. In lighting 1, the proposed method shows at least a 14% higher correct rate than all the other methods. The stereo results obtained from corrupted images, dehazed images using the method in [36], and the stereo results obtained by using method in [40] are almost identical while the stereo results obtained from dehazed images using the method in [35] are worse than using corrupted images. There are several reasons for this. First, NSSD is capable of compensating offset and gain [45]; thus, it already works well in the case of corrupted images. As mentioned in Section 1, the method in [40] assumed that the depth in the supported window is constant, which led to wrong estimation of the scattering signal at depth discontinuities, especially in a highly non-homogeneous scattering signal area. In the datasets with lighting 1, there is strong backscattering at the high depth discontinuities areas of the datasets, as in the example of the Pipes dataset shown in Figure 5. Therefore, there is no improvement compared with the corrupted images. The method in [35] provide the worst results because this method is unable to remove the strong backscatter in the image due to their imaging model. The method in [36] has the ability to remove glow, and hence works better than that in [35]. In lighting 2, as shown in Figure 6, the light illuminates the scene above the camera; thus, a strong backscattering signal projects into the higher area of images. In these datasets, these regions have fewer depth discontinuities. Consequently, the disparity map correct rate obtained by using the method in [40] is about 11% greater than that of the original corrupted images. The nighttime dehazing methods in [35,36], and our method show the identical correct rate compared with the rate in the previous case. It should be noted that our disparity map quality is the best and is 20% higher than the disparity obtained from the input images. In the case of non-uniform steam, the results of dehazed images from [36] and our defogged images have almost the same quality and slightly higher quality than the others. Since in the real system we employ SGM, the proposed method is also compared with backscatter-corrupted images [35,36], using SGM as the stereo algorithm, as shown in Table 3 and an example in Figure 6. In this case, SGM performs worse than NSSD when using corrupted images while it performs better using dehazed images from [35,36], and ours. When using SGM, the method in [35] provides slightly better quality than the original images. In the case of uniform fog, the proposed method improves the matching rate by about 35% and 20% compared with input images and dehazed image abtained by using the method in [36], respectively. In the case of non-uniform steam, our method and the method in [36] are nearly the same, being 10% greater than the inputs. mentioned in Section 1, the method in [40] assumed that the depth in the supported window is constant, which led to wrong estimation of the scattering signal at depth discontinuities, especially in a highly non-homogeneous scattering signal area. In the datasets with lighting 1, there is strong backscattering at the high depth discontinuities areas of the datasets, as in the example of the Pipes dataset shown in Figure 5. Therefore, there is no improvement compared with the corrupted images. The method in [35] provide the worst results because this method is unable to remove the strong backscatter in the image due to their imaging model. The method in [36] has the ability to remove glow, and hence works better than that in [35]. In lighting 2, as shown in Figure 6, the light illuminates the scene above the camera; thus, a strong backscattering signal projects into the higher area of images. In these datasets, these regions have fewer depth discontinuities. Consequently, the disparity map correct rate obtained by using the method in [40] is about 11% greater than that of the original corrupted images. The nighttime dehazing methods in [35,36], and our method show the identical correct rate compared with the rate in the previous case. It should be noted that our disparity map quality is the best and is 20% higher than the disparity obtained from the input images. In the case of non-uniform steam, the results of dehazed images from [36] and our defogged images have almost the same quality and slightly higher quality than the others. (b) Figure 5. An example of synthetic images of Pipes [9]; the stereo method is NSSD: (a) Lighting 1uniform; (b) Lighting 1-non-uniform. The first column is corrupted images. The second column shows the disparity map from input images and the one obtained by using the method in [40]. "N&S" stands for Negahdaripour and Sarafraz [40]. The third to last columns are the defogged (or descattered) images and disparity maps using the methods from [35,36] and the proposed method, respectively. "Disp." and "Defog." stand for disparity map and defogged image, respectivley.
Since in the real system we employ SGM, the proposed method is also compared with backscatter-corrupted images, [35,36], using SGM as the stereo algorithm, as shown in Table 3 and an example in Figure 6. In this case, SGM performs worse than NSSD when using corrupted images while it performs better using dehazed images from [35,36], and ours. When using SGM, the method in [35] provides slightly better quality than the original images. In the case of uniform fog, the proposed method improves the matching rate by about 35% and 20% compared with input images and dehazed image abtained by using the method in [36], respectively. In the case of non-uniform steam, our method and the method in [36] are nearly the same, being 10% greater than the inputs.   [9]; the stereo method is NSSD: (a) Lighting 1-uniform; (b) Lighting 1-non-uniform. The first column is corrupted images. The second column shows the disparity map from input images and the one obtained by using the method in [40]. "N&S" stands for Negahdaripour and Sarafraz [40]. The third to last columns are the defogged (or descattered) images and disparity maps using the methods from [35,36] and the proposed method, respectively. "Disp." and "Defog." stand for disparity map and defogged image, respectivley.  6. An example of synthetic images of Motor in the case of lighting 2 and uniform fog: The first column is corrupted images; the second is disparity from input images; the third to the last columns are the defogged (or descattered) images and disparity maps obatined by using the methods in [35,36] and the proposed method, respectively.

Stereo Vision Results from Real Images
In Section 2, it is assumed that the input image I ( obj ) is given in the actual scene radiance values. The radiance maps can be recovered by inverting the acquisition response curve proposed by Figure 6. An example of synthetic images of Motor in the case of lighting 2 and uniform fog: The first column is corrupted images; the second is disparity from input images; the third to the last columns are the defogged (or descattered) images and disparity maps obatined by using the methods in [35,36] and the proposed method, respectively.

Stereo Vision Results from Real Images
In Section 2, it is assumed that the input image I i (x i obj ) is given in the actual scene radiance values. The radiance maps can be recovered by inverting the acquisition response curve proposed by Debevec and Malik [42]. This is the only preprocessing step, which is employed in our experiment. This step also helps reducing variations in color which are produced by two different cameras in the stereo vision system. Debevec and Malik [42]. This is the only preprocessing step, which is employed in our experiment. This step also helps reducing variations in color which are produced by two different cameras in the stereo vision system. Figure 7 shows a comparison of the depth map quality between the descattered and defogged images from the proposed method of two kinds of conditions, namely, uniform (V = 2.4 m) and nonuniform steam. When dealing with images corrupted by uniform dense steam, descattered images are better than defogged images. In the case of non-uniform steam, defogged images, however, provide better result. This is consistent with the simulation results as shown in Table 1.
We depicted several real experiment data in Figure 8 and 9. Figure 8a,b show two examples of lighting setup 1 in dense uniform steam (V are 4.24 and 3.39 m) using NSSD. In Figure 8a, the proposed method performs the best and more depth detail can be reconstructed while [40] shows the worst result in reconstructing the chair. The reason for this is the assumption of [40] as mentioned in the previous section. The method in [40], however, has better ability to estimate the background depth. Figure 8b shows a similar trend. Figures 8c,d illustrate examples of non-uniform fog under setup 2 using NSSD. In both cases, the valve is tilted at an angle of 20° to 30° compared with cameras' optical axis and the distance from the center of the valve to cameras is 1.2 m. In both cases, the proposed method outperforms the input images, [35,36,40], in constructing the depth of object, especially the valve. The depth results from input images and that obtained by using method in [40] are the worst in both cases, especially in strong backscattering regions. In Figure 8d, the method in [35] performs better than that in [36] because the dehazed images of [36] are very dark in the lower areas. Figure 9 depicts examples under setup 2 using SGM and the effect of polarization. In Appendix A, we discuss about the active polarization lighting and the effects of polarization. Figure 9a Figure 9a,c, respectively, when the polarization angles are 90°. In both cases, the proposed method outperforms the input images [35,36], in reconstructing the depth of the object, especially the valve.   Figure 7 shows a comparison of the depth map quality between the descattered and defogged images from the proposed method of two kinds of conditions, namely, uniform (V = 2.4 m) and non-uniform steam. When dealing with images corrupted by uniform dense steam, descattered images are better than defogged images. In the case of non-uniform steam, defogged images, however, provide better result. This is consistent with the simulation results as shown in Table 1.
We depicted several real experiment data in Figures 8 and 9. Figure 8a,b show two examples of lighting setup 1 in dense uniform steam (V are 4.24 and 3.39 m) using NSSD. In Figure 8a, the proposed method performs the best and more depth detail can be reconstructed while [40] shows the worst result in reconstructing the chair. The reason for this is the assumption of [40] as mentioned in the previous section. The method in [40], however, has better ability to estimate the background depth. Figure 8b shows a similar trend. Figure 8c,d illustrate examples of non-uniform fog under setup 2 using NSSD. In both cases, the valve is tilted at an angle of 20 • to 30 • compared with cameras' optical axis and the distance from the center of the valve to cameras is 1.2 m. In both cases, the proposed method outperforms the input images, [35,36,40], in constructing the depth of object, especially the valve. The depth results from input images and that obtained by using method in [40] are the worst in both cases, especially in strong backscattering regions. In Figure 8d, the method in [35] performs better than that in [36] because the dehazed images of [36] are very dark in the lower areas. The first column is corrupted images; the second column shows the depth maps from input images and those obtained by using method in [40]; the third to last columns are the defogged (or descattered) images and disparity maps using methods in [35,36] and the proposed method, respectively. The number under the every depth map is the measured depth at the red dot. The first column is corrupted images; the second column shows the depth maps from input images and those obtained by using method in [40]; the third to last columns are the defogged (or descattered) images and disparity maps using methods in [35,36] and the proposed method, respectively. The number under the every depth map is the measured depth at the red dot. . The first column is corrupted images; the second is disparity from input images; the third to last columns are the defogged (or descattered) images and disparity maps obatined using methods in [35,36] and the proposed method, respectively. The number under the every depth map is the measured depth at the red dot. . The first column is corrupted images; the second is disparity from input images; the third to last columns are the defogged (or descattered) images and disparity maps obatined using methods in [35,36] and the proposed method, respectively. The number under the every depth map is the measured depth at the red dot.  Figure 9a,c, respectively, when the polarization angles are 90 • . In both cases, the proposed method outperforms the input images [35,36], in reconstructing the depth of the object, especially the valve.
For every method, utilizing orthogonal polarization provides a better result than using a polarization angle of 45 • . Directly using input images does not work well in both polarization angles. One important observation is that all methods can estimate the distance to the center of the valve accurately. Our system is better since it provides more constructed points. Finally, another crucial factor to utilize the vision algorithm in a real robot application is real-time capability. Table 4 shows the processing time to obtain the descattered or defogged images. We took the average processing time when processing 100 images continuously. The software and code run in different environments. Authors of [35] provided their software, which is an executable file in C++ environment, while authors of [36] provided a protected function run in Matlab. We implemented our descattering and defogging method using Matlab (non-optimized implementation). Thus, this is not a fair comparison. Nevertheless, we demonstrate a near real-time capability of our descattering method to enhance the input images for the stereo vision system with a processing time of 34 ms for a single image.

Verification with Robot Manipulation
To verify the proposed algorithm, we successfully demonstrated robot manipulation in a foggy condition. In this chapter, the robot system of the manipulator is introduced, and the results of a valve turning mission in a foggy condition are presented.

The Robot System of the Manipulator
The robot manipulator is constructed with seven actuators (shoulder: three axes, elbow: one axis, and wrist: three axes) to mimic the human arm configuration, which is a redundant system. The actuator models used in the robot manipulator are PRL+120, ERB-145, and ERB-115, which are produced by SCHUNK Corporation (Mengen, Germany). The specifications of the actuator model are given in Table 5.

Manipulation Experiment in Foggy Condition
We performed a manipulation experiment in foggy conditions to verify the effectiveness of the descattering method in a real robotics application.

Experiment Environment
The experiment environment is illustrated in Figure 10. The LiDAR (MultiSense SL from Carnegie Robotics, Pittsburgh, PA, USA) is also placed in the experiment environment for comparison. With the laser-based visibility measurement system, we monitor the visibility. To generate the fog, we used the fog machine, which has a power of 1500 W.

Experiment Results
With the proposed descattering-then-stereo algorithm, we are able to obtain a depth map. Based on the depth map, points of the valve are manually selected by the user. From these points (for example, 10 points), the center coordinate, normal vector, and radius of the valve is accurately extracted in a foggy condition. As shown in Figure 12 With the valve information, the mission to turn the valve is successfully performed, as shown in Figure 13. The operator controls the robot remotely only using the vision data. As shown in Figure 9, backscatter-corrupted images generate poor quality depth maps. Therefore, although our method does not directly benefit the manipulation task, it helps providing higher quality input images for stereo vision. More specifically, our method reconstructs denser depth maps, from which we can select more points from a larger variety of positions to produce a more accurate estimation. With the fog machine, the foggy condition where the visibility range is under 2 m can be generated in experimental setup 2, as explained in the previous section. As seen in Figure 11, the LiDAR works well in a clear environment. However, in the dense fog condition, it is unable to work.

Experiment Results
With the proposed descattering-then-stereo algorithm, we are able to obtain a depth map. Based on the depth map, points of the valve are manually selected by the user. From these points (for example, 10 points), the center coordinate, normal vector, and radius of the valve is accurately extracted in a foggy condition. As shown in Figure 12 With the valve information, the mission to turn the valve is successfully performed, as shown in Figure 13. The operator controls the robot remotely only using the vision data. As shown in Figure 9, backscatter-corrupted images generate poor quality depth maps. Therefore, although our method does not directly benefit the manipulation task, it helps providing higher quality input images for stereo vision. More specifically, our method reconstructs denser depth maps, from which we can select more points from a larger variety of positions to produce a more accurate estimation.

Experiment Results
With the proposed descattering-then-stereo algorithm, we are able to obtain a depth map. Based on the depth map, points of the valve are manually selected by the user. From these points (for example, 10 points), the center coordinate, normal vector, and radius of the valve is accurately extracted in a foggy condition. As shown in Figure 12

Conclusions
In this paper, we present our descattering method, which can enhance images corrupted by strong non-uniform backscattering from an active illumination source. The method is very promising since it can enhance images for stereo vision and it is near real-time capable.
It is worth noting that our method is a model-based method. The proposed method and method from [40] are based on the pre-calibrated saturated backscattering. Thus, it is not surprising that our method outperforms the methods from [35,36]. However, we have proposed a simple method that is able to enhance the images of dense fog or dense steam scenes very efficiently for stereo vision. The method is not restricted to our application. It can be utilized in other applications where active lighting is necessary, such as underwater robots.
An important issue in using our method is the choice whether to use descattered images or defogged images, such that a uniform fog/steam environment requires descattered images while a non-uniform environment requires defogged images. In practical operation, as mentioned in Section 5, the operator controls the robot remotely using vision data. The operator is also the one to make this decision. Algorithm to automatically detect non-uniform (heterogeneous) fog environment would be an issue for future works. With the valve information, the mission to turn the valve is successfully performed, as shown in Figure 13. The operator controls the robot remotely only using the vision data. As shown in Figure 9, backscatter-corrupted images generate poor quality depth maps. Therefore, although our method does not directly benefit the manipulation task, it helps providing higher quality input images for stereo vision. More specifically, our method reconstructs denser depth maps, from which we can select more points from a larger variety of positions to produce a more accurate estimation.

Conclusions
In this paper, we present our descattering method, which can enhance images corrupted by strong non-uniform backscattering from an active illumination source. The method is very promising since it can enhance images for stereo vision and it is near real-time capable.
It is worth noting that our method is a model-based method. The proposed method and method from [40] are based on the pre-calibrated saturated backscattering. Thus, it is not surprising that our method outperforms the methods from [35,36]. However, we have proposed a simple method that is able to enhance the images of dense fog or dense steam scenes very efficiently for stereo vision. The method is not restricted to our application. It can be utilized in other applications where active lighting is necessary, such as underwater robots.
An important issue in using our method is the choice whether to use descattered images or defogged images, such that a uniform fog/steam environment requires descattered images while a non-uniform environment requires defogged images. In practical operation, as mentioned in Section 5, the operator controls the robot remotely using vision data. The operator is also the one to Valve Figure 13. The snapshot of the robot turning the valve in dense fog condition.

Conclusions
In this paper, we present our descattering method, which can enhance images corrupted by strong non-uniform backscattering from an active illumination source. The method is very promising since it can enhance images for stereo vision and it is near real-time capable.
It is worth noting that our method is a model-based method. The proposed method and method from [40] are based on the pre-calibrated saturated backscattering. Thus, it is not surprising that our method outperforms the methods from [35,36]. However, we have proposed a simple method that is able to enhance the images of dense fog or dense steam scenes very efficiently for stereo vision. The method is not restricted to our application. It can be utilized in other applications where active lighting is necessary, such as underwater robots.
An important issue in using our method is the choice whether to use descattered images or defogged images, such that a uniform fog/steam environment requires descattered images while a non-uniform environment requires defogged images. In practical operation, as mentioned in Section 5, the operator controls the robot remotely using vision data. The operator is also the one to make this decision. Algorithm to automatically detect non-uniform (heterogeneous) fog environment would be an issue for future works.

Appendix A. Polarization-Based Backscattering Removal
Using artificial light that is close to the camera, as in our system, causes a strong backscatter. To partly remove the strong backscattering, we utilize polarization, which has proven excellent ability to reduce backscattering [23,24,[46][47][48]. We simply modify the conventional image acquisition system to make active polarization, similar to [23,24], by adding three polarizers: each one is mounted in front of the light source and every camera in the stereo vision system. To ensure photo-consistency, we aligned the polarizers in front of the two cameras so that they are in the same state. We use polarization angle to call the angle between the state of the polarizer of the light source and the state of the polarizers of the two cameras. This system provides a pure optics-based scattering removal for improving visibility. The best situation for backscattering removal is when the polarization angle is 90 • [23,24]. Through our experiment, we found that it is very feasible to utilize the active polarizer. The active polarization lighting reduces both saturated backscattering and the constant parameter k.
To estimate the parameter k, we first apply the initial step of our proposed descattering method, as in Equation (16). The feature extraction and matching are then done by employing speeded up robust features (SURF) [49]. It should be noticed that SURF is unable to detect any feature when applied to the original corrupted images. The constant k can be estimated as: where N is the number of detected features; j is the feature index; and k j can be easily obtained by inverting Equation (13), noting that the attenuated radiance in the left and right images are the same: .
(A2) Figure A1 depicts an estimation of the constant k and the effect of the active polarization on k. Figure A1a shows an example of feature matching between left and right images using SURF. From these matched features, k can be estimated by using Equations (A1) and (A2). The method was applied to images of different attenuation coefficient c and two different polarization angles, namely, 45 • and 90 • . To change states of the polarization, we fix the polarizers of the two cameras and rotate the polarizer in front of the light source. Figure A1b  shows that when c is smaller than 0.8 m −1 , k and c are close each other. Additionally, k of values of different polarization angles are almost identical. When c is beyond 0.8, k in the solid lines increase dramatically with c. The dash lines, however, almost have a linear relationship with c. We could not measure k in denser conditions since very few features can be detected and matched. However, it can be proven that the orthogonal setup of polarization between the illuminator and receivers provides an advantage in scattering reduction because the rate at which S(x i obj ) increases with X i obj set by parameter k. It is worth noting that k also depends on the lighting geometry; thus, the values shown in Figure A1 are only correct in the specific lighting geometry in our experiment. The method to measure c is be explained in Section 4.1.  Figure A2 shows another advantage of using polarization. Besides reducing k, the orthogonal setup can reduce as much as 50% of saturated backscattering of the 45° setup where the signal attains the highest value ( Figure A2c). From these two figures above, it is proven that, with slight modification of the system, we can significantly remove the scattering signal. It is also well known that the specular reflection often leads to problems in stereo matching. Therefore, by removing specular reflection, we can also improve stereo matching.  Figure A2 shows another advantage of using polarization. Besides reducing k, the orthogonal setup can reduce as much as 50% of saturated backscattering of the 45° setup where the signal attains the highest value ( Figure A2c). From these two figures above, it is proven that, with slight modification of the system, we can significantly remove the scattering signal. It is also well known that the specular reflection often leads to problems in stereo matching. Therefore, by removing specular reflection, we can also improve stereo matching.  Figure A2 shows another advantage of using polarization. Besides reducing k, the orthogonal setup can reduce as much as 50% of saturated backscattering of the 45 • setup where the signal attains the highest value ( Figure A2c). From these two figures above, it is proven that, with slight modification of the system, we can significantly remove the scattering signal. It is also well known that the specular reflection often leads to problems in stereo matching. Therefore, by removing specular reflection, we can also improve stereo matching.