1. Introduction
In field robotics, visual recognition is the main idea behind robot navigation and task performance. However, images can be degraded by poor atmospheric conditions or being underwater, which raises problems with visual recognition using cameras, such as haze effects, contrast loss, and color distortion. Visible enhancement of these degraded images is important for many robotic applications, such as SLAM (simultaneous localization and mapping) [
1,
2], object recognition and grasping [
3], and underwater robotics [
4,
5]. Many previous studies of dehazing [
6,
7,
8,
9,
10] are biased toward the visibility of restoring a single image for monitoring. However, robot navigation with real hazard images causes false localization and mapping. It eventually leads to crucial problems on the real self-driving ability of robots. Generally, degraded images are generated by light absorption, scattering by particles, water droplets, and many other external factors. Moreover, images captured in a fiery, smoke-laden room or underwater are extremely damaged by very dense turbidity and are limited to short ranges.
In the early stages of image dehazing for atmospheric images, logarithmic image processing and contrast enhancement methods were widely applied to achieve better visibility of scenes. There are many related works on turbid atmosphere conditions [
6,
7,
11]. Effective single image dehazing methods used Independent Component Analysis (ICA), which was introduced in [
6]. ICA is the first breakthrough in haze removal methods using single images. However, it does not work well on grayscale images and requires lots of computational time. He [
7] demonstrated single image dehazing using Dark Channel Prior (DCP). This idea is related to a strong prior that at least one of the channels (i.e., of R, G, and B) of each pixel is low in haze-free images. DCP-based dehazing is effective and simple, but the channel prior is weak for sky regions and underwater images. Moreover, it is hard to operate in real time because of its soft-matting technique, which requires a lot of computational time. He [
11] proposed guided filters for time-effectiveness, but a weakness is admitted for sky regions and underwater.
Dehazing methods underwater have also been developed to enhance visibility for underwater exploration. Underwater single-image dehazing methods using wavelength-dependent attenuation of light in water were developed in [
9]. They calculate the depth prior from the strong difference in attenuation between color channels, then use it to recover the scene radiance by Markov random field (MRF). Ancuti et al. [
10] proposed multi-scale fusion-based image dehazing with two generated inputs. It is time effective for underwater images; however, it cannot construct relative depth or a transmission map. Cho et al. [
12] utilized the Gaussian process for depth estimation, but the performance was limited and did not consider the information level of incoming depth information. Berman et al. [
13] proposed a haze line-based approach to estimate ambient color and transmission within color-biased environments. There are also studies that have improved image information using sparse depth obtained from robot platforms. Babaee and Negahdaripour [
14] used sparse sonar depth in an underwater environment for hybrid dehazing methods by fusing optical images and acoustic images. With previous research on sensor fusion, Babaee and Negahdaripour [
14] matched the partial depth of imaging sonar to optical images and applied MRF with the intensity of optic images and acoustic images to estimate dense depth maps. This can predict a real-scale depth map from sonar images and reconstruct dehazed images in strong haze optical images. However, image enhancement works weaker and the MRF-based slows it down. Cho and Kim [
15] exploits Doppler velocity log (DVL) sparse depth to enhance underwater images. Because planar scenes should be assumed, there are difficulties when applying to non-planar situations.
As research related to deep learning increases, dehazing research on the underwater environment has also increased. These studies attempted to restore the image itself using the network structure of CNN (convolution neural network), GAN (generative adversarial network), or ViT (vision transformer). In [
16], the authors performed image restoration with a clean underwater image using a CNN network structure for underwater images (UWCNN). This paper introduced the underwater image synthesis method with various water types and turbidity and utilized the structure similarity error to preserve the original textures of images. Wang et al. [
17] proposed CNN-based image enhancement incorporating both RGB and HSV color space images. This method utilizes HSV color transform in the network to compensate for the weakness of the RGB color space that is insensitive to the luminosity and saturation of the image. This method can remove color cast and include detailed information. Fabbri et al. [
18] performed image enhancement using GAN. CycleGAN was used to form a ground truth and distorted image pair. Given a distorted image, CycleGAN converts it as if it came from the domain of an undistorted image. The image pair created is used for image reconstruction. FUnIE-GAN, based on the conditional GAN network, was also introduced in [
19]. The loss function of the FUnIE-GAN evaluates the perception quality of images, such as global similarity, content, local text, and style. Image enhancement can also be applied to both paired and unpaired images.
There are several studies on aerial image dehazing with deep learning-based approaches. Li et al. [
20] proposed light-weight CNN to convert haze images to clean images. Song et al. [
21] presented DehazeFormer, which is a combination of the Swin Transformer and U-Net with several critical modifications, including the normalization layer, activation function, and spatial information aggregation scheme. citetqin2020ffa proposed a feature fusion attention network using local residual learning and feature attention modules, which give more weight to more important features. Liu et al. [
22] introduced a CNN structure for a single image dehazing. To alleviate the bottleneck problem, it applied an attention-based multi-scale estimation on the grid network.
From previous methods, we can deduct several objectives of dehazing issues for robotics problems. First, the method should run in real time and ’learn’ data online. Second, the algorithm should work regardless of the channels. The last objective is involves the estimation of the normalized depths of images. Depth information is important and necessary not only for dehazing but also for mapping issues.
In this paper, we propose an online image dehazing method with partial and sparse depth data using iGP depth estimation. We achieved a general method for both color and grayscale images with minimum user parameters. To estimate the reliable depth, we used partial depth information with low-level fusion. As mentioned in [
14], a dense depth map is needed to reconstruct haze-free images in a strong turbid medium. It is easy to detect the partial depth data of optical images by simple sensor fusion. In comparison with [
14], the proposed method only requires low level fusion with any range sensors. Our proposed method can choose the most informative input data points, which cover the vulnerable region of estimated depth. We automatically chose dehazing parameters, such as airlight and transmission. The evaluation was performed with other previous methods by comparing qualitative results (visibility) and quantitative results (feature matching is presented). In summary, the contributions to our methods are as follows:
Online depth learning using iGP;
Information theoretic data point selection;
Quantitative measures for dehazing qualities based on information metrics;
Independence of the channel number (both color and gray images).
This paper proceeds as follows. In
Section 2, we introduce a general atmospheric scattering model for dehazing. The method is described in
Section 3 and
Section 4. We present experimental results in
Section 5.
4. Information Enhanced Online Dehazing
In many robotics applications that use range measurements, a very sparse range prior is often available. For instance, Light Detection and Ranging (LIDAR) for the aerial/ground platform may be sparse, and DVL provides useful four range measurements from beams together with the velocity of an underwater vehicle. In this section, we discuss intelligent training point selections using information measures. We propose simultaneous dense depth estimations and dehazing with information-enhanced iGP. For efficient depth estimation with iGP, selecting the training points needs to be done intelligently in order to keep the accumulated training inputs minimal.
Two types of MI are introduced. First, we measured MI between previous training inputs and newly obtained inputs, and chose more informative points among new input candidates. Secondly, we measured the MI of the GP model that appeared in [
26]; this revealed the information levels in the candidate points. In this work, we used the latter measure when (
i) verifying training inputs and (
ii) deciding on a stop criterion.
Although the proposed method can be applied to any type of range measurement, we illustrate our method assuming one of the common sensor configurations in [
27] with incrementally updated vertical depth points for explanations. Note that the proposed method is applicable to general range data.
4.1. Active Measurement Selection
Let be the n existing training inputs until t step, and be the m newly added training inputs at the current time step t. A single input datum is defined as a five-dimensional vector for a color image and a three-dimensional vector for a grayscale image. Each vector consists of a pixel location (), and three channels () or an intensity channel (i).
For informative training data point selections are essential for deciding which point is the best to cover the weak estimation region of previous training. We find the necessity of a new metric for choosing the best point. There are two objectives of the metric, the first one is that it is computationally efficient and the second is that it produces meaningful information between inputs. One of the key features of GP is estimation variance that can be used for data selection. Intuitively, test inputs that are most far from the training inputs have the largest uncertainties, meaning that new training inputs should be uncorrelated with previously selected inputs. By the characteristics of GP, each input point constructs the multivariate Gaussian distribution; therefore, we can use the MI of the Gaussian distribution. The MI of two input points,
and
, can be defined as below.
As shown in (7), MI is a function of the covariance values from the Gram matrix, i.e., the
element (
) and
element (
) of the Gram matrix up to time
t. Given the current Gram matrix (
) and
m newly added training points, the naive operation of the Gram matrix handles the whole
matrix, which can be significantly inefficient. Our finding is that the correlation between the existing and newly added inputs has a major impact. This correlation is described in
Figure 2 as the
matrix. To evaluate the information for each newly added data point, we introduce Sum of Mutual Information (SMI) for each candidate, to sum each row (i.e., summation of MI between the
jth candidate of all other previous training points). For example, SMI for the
jth input candidate is defined below.
This contains a correlation between the current point
and all previous inputs. This
metric is accumulated as
and we picked the best one that was the least correlated with previous inputs (9). For updating the Gram matrix, we updated the Gram submatrix
as in Algorithm 1 (line 5).
We compared the quality of three possible candidates in
Figure 2 and
Figure 3. From the SMI bar in
Figure 2, we selected three types of selecting criteria (proposed, randomly, and mutually related).
Figure 3 describes the toy example results. In this validation, we assumed depth information was given as a vertical line (yellow line on the image) for each time step. Among the raw inputs for selecting the candidates, we chose the best one according to three criteria—error mean, maximum uncertainty, and MI of
K. The graph shows depth estimation when the selecting criteria differ. Using the proposed point selection, the average error drops faster, largest uncertainty is lower, and MI increases faster than the other two methods (random, correlated).
Algorithm 1 Active point(s) selection |
- 1:
Input: Input image , Previous inputs , New inputs , Step T - 2:
(Compute-K(, ) - 3:
Compute-SMI(▹(8) - 4:
▹(9) - 5:
Update-K() - 6:
Output: Gram submatrix , Selected input
|
4.2. Stop Criterion
The more training data we used, the better the GP regression results. However, memory and computational issues existed as the training set increased. Therefore, we also propose a stop criterion for training based upon the information of the Gram matrix. A similar approach was presented in [
26], who used MI (
) for the GP optimization. Contal et al. [
26] introduced upper bounds on the cumulative regret by an MI-based quantity measure. Motivated by this, we reversely utilized the increment of MI information to decide the quality of training inputs and a GP model. We computed a similar metric but used
.
We computed the delta between two consecutive time steps,
t and
, by calculating
. By comparing
and
, a stop flag was activated if the increment amount of step
t dropped under a particular rate (
) of the previous step (i.e.,
,
). We used
and stop training if the increase of information was slower than 80% of the previous update. The detailed procedure of dense depth map estimation with informative point selection is described in Algorithm 2.
Algorithm 2 Online Dehazing with iGP |
- 1:
Input: Input image , set of training inputs X, set of training outputs Y, number of point sets - 2:
, , - 3:
while () () - 4:
if - 5:
Initialize-K() - 6:
- 7:
- 8:
QR-Decompse() - 9:
- 10:
- 11:
else - 12:
ActivePointSelect() - 13:
▹ Algorithm 1 - 14:
- 15:
- 16:
- 17:
QR-Decompse() - 18:
- 19:
- 20:
- 21:
StopCriterion( ) ▹ Section 4.2 - 22:
end if - 23:
end while - 24:
- 25:
MatToArray() - 26:
Compute-K() - 27:
ComputeGP-Mean() - 28:
Output: Dense depth map
|
4.3. Dense Depth Map-Based Haze Removal
Having predicted the dense depth map, we could reconstruct the haze-free image via (1). Unknown parameters are airlight
A and attenuation coefficients
. First,
A can be estimated from the depth map and an input image motivated from [
8]. The dense depth map was already prepared; therefore, we picked pixels from 0.1% of the deepest points in the depth map and selected the pixel color that had the maximum brightness. These pixel color values were used for the
A input image. Moreover, we estimated
by fitting the transmission model with brightness decaying nearest (
) and the deepest (
) pixels. In other words, the estimated
is
. The last part of the image dehazing is white balance. It is not effective for a normal aerial image with airlight near white; however, it helps restore biased color for underwater and indoor environments. The dehazing algorithm is described in Algorithm 3. It takes input image
and depth map
, and returns dehazed image
.
Algorithm 3 Image Dehaze |
- 1:
Input: Input image , dense depth map - 2:
EstimateArilight(, ) - 3:
EstimateArilight(, ) - 4:
EstimateBeta() - 5:
Output: ApplyDehazing(, A, )
|
6. Conclusions
This paper focused on iGP-based real-time haze removal with sparse partial depth cues. We first set the kernel function as a mixture of SE and NN kernels for better estimation performances on real datasets with sudden spatial changes. Moreover, a new SMI metric was introduced for selecting the best points among newly added inputs. With this information measure, we can avoid unnecessary training and keep the estimation model efficient; thus, the algorithm can operate in real time. Having obtained a depth map, in addition, we estimated parameters (A, ) in a dehazing model that were chosen automatically. For evaluation, we tested the method on synthetic fog and real indoor haze datasets (for color and gray images). In addition, we evaluated the proposed method to underwater images.
There were a few problems, however, which need real applications. First, dehazing parameter estimation is a rough approximation. In our experiments, this method is enough to predict reasonable parameter values. However, it is necessary to test this part in various environments. The second part involves constructing a global GP model for robust estimation using sequential and multiple input images.