A Bayesian Probabilistic Framework for Rain Detection

Heavy rain deteriorates the video quality of outdoor imaging equipments. In order to improve video clearness, image-based and sensor-based methods are adopted for rain detection. In earlier literature, image-based detection methods fall into spatio-based and temporal-based categories. In this paper, we propose a new image-based method by exploring spatio-temporal united constraints in a Bayesian framework. In our framework, rain temporal motion is assumed to be Pathological Motion (PM), which is more suitable to time-varying character of rain steaks. Temporal displaced frame discontinuity and spatial Gaussian mixture model are utilized in the whole framework. Iterated expectation maximization solving method is taken for Gaussian parameters estimation. Pixels state estimation is finished by an iterated optimization method in Bayesian probability formulation. The experimental results highlight the advantage of our method in rain detection.


Introduction
The quality of video captured from outdoor electronic equipments can be heavily degraded by bad weather such as rain, snow, haze or fog.The degraded video imposes great constraints on a lot of video applications such as video tracking [1], object recognition [2], event detection [3], scene analysis [4], image registration [5], etc.In order to improve the results of these video processing, recently many works have focused on degraded video caused by bad weather.Among these works, rain detection has received much attention.In order to characterize and validate rain detection, many sensor-based and vision-based methods have been applied [6].The sensor-based methods often use different frequency selection, scanning modes, and the application of radar for rain detection [7].However, the application of rain detection is limited by the cost of sensors.On the contrary, vision-based method presents wider application.A lot of image processing and computer vision methods pave the way for rain detection and removal.
In previous reported image-based methods, the physical property and image spatial-temporal characters of rain were applied efficiently.In order to characterize the photometry of rain, Garg [8] proposed a stochastic model based on the physical property of rain.Since different adjustments of camera (i.e., exposure velocity, focus distance, etc.) can improve the visual effects of the rain-containing video, Garg [9] presented a method based on the adjustment of camera parameters.As a pioneering work, Garg [10] also proposed a realistic rain rendering technique.A rain distribution database can also be downloaded from his web site.A median filter method was proposed by Hase [11], which makes use of the temporal property of rain steaks.Zhang [12] extended this method by k-means clustering involving chromatic constrains.Brewer and Liu [13] combined the aspect ratio and the orientation of rain streaks into the rain detection, which efficiently reduce false detection.Barnum et al. [14,15] proposed a global appearance model to formulate rain in the frequency domain.Moreover, a image-based processing method was proposed by Kang [16], which implements rain removal by an image decomposition way based on morphological component analysis (MCA) [17,18].In Kang's method, image noise removal method (i.e., bilateral filter, K-SVD dictionary train algorithm, etc.) [19][20][21][22][23][24] was used to highlight the advantage of the MCA-based algorithm.In order to improve the accuracy of the detection of rain streaks, histogram of orientation of rain (HOS) was applied in Bossu's proposal [25].Gaussian uniform mixture model and expectation maximization (EM) algorithm [26] were adopted in Bossu's algorithm.
Totally, the state-of-the-art techniques on rain processing fall within two categories.Spatial techniques consist of one category.These techniques make full use of image spatial correlation, such as [16].Rain steaks in image/video are regarded as high frequency information.Hence, the goal of spatio-based method is try to remove image high frequency information containing rain steaks.To some extent, this is similar to some image denoise technique.The other category contains temporal-based rain streaks processing methods.Obviously, temporal redundant information is applied for rain or snow detection.Such as [8,12,15], neighboring frames are incorporated into the whole detection framework according to the characteristics of rain steaks in temporal field.However, both spatial and temporal methods rely on image/video spatial and temporal redundancy.Inspired by [27,28], we build a Bayesian framework to formulate rain or snow detection, which involves long-term temporal constraints and prior distribution of rain or snow.In order to characterize rain detection, we try to harmonize the spatial and temporal considerations into our new Bayesian framework to make full use of the image/video redundant information.Spatial interpolation, temporal relevant information copy or spatio-temporal reconstruction is undertaken for rain removal under the guidance of a rain detection mask.The determination of rain detection state is attained by Bayesian maximum a posteriori (MAP) solution.
In this paper, the motion character of rain or snow is assumed to be Pathological Motion (PM), which is introduced in [27,28].Before presenting the details of our method, we would like to summarize the novel contribution of our paper, which include: (1) formulation of a Bayesian probabilistic framework that derives an estimation of pixel state field from the maximum a posteriori (MAP) solution; (2) integration of spatial and temporal likelihood as well as MRF prior into the Bayesian framework; (3) comparative analysis of our Bayesian method with previous method.
The remainder of this paper is organized as follows.The algorithm is formulated in Section 2. The experimental results are shown in Section 3. Section 4 concludes the paper.

Description of Algorithm
In this section, we present our algorithm that exploits Bayesian framework to formulate rain detection.For the convenience of notation, we use I n (x) to denote the illumination of the current video frame, where n is the frame number and x is image index.

Temporal Discontinuity Description
We use a label field, l(x), to denote the pixel's state.l(x) = 1 means that the current pixel belongs to rain streaks.On the contrary, l(x) = 0 refers to non-rain region for pixel x.Under the heuristics of [28], a temporal window of five frames, (I n−2 (x), I n−1 (x), I n (x), I n+1 (x) and I n+2 (x)), is adopted in our algorithm.The displaced frame difference (DFD) between neighboring frames is used as the measure of temporal discontinuity in the five frame window.DFDs are defined as ), is obtained from the four DFDs.TDF is defined as follows where δ t is a threshold for DFDs.Obviously, there are sixteen possibilities for the four TDFs.A state field s(x) is defined to describe all of the possibilities.Each s(x) is directly mapped to a value of l(x) (Table 1).Effectively, our mapping table is different from Corrigan's [28] for rain detection.If rain streaks exist in frame n, the absolute values of the DFD between neighboring frames will be large.

Spatial Distribution of Rain Streaks
In light of [25], the feature of rain streaks spatial distribution has shed light on rain steaks detection problem.In the proposed algorithm, a Gaussian-uniform mixture distribution is adopted for the orientation of gradient.We use G x and G y to represent the horizontal and vertical gradient of pixel.Therefore, the orientation of gradient, θ, is denoted with where N(•) is a Gaussian distribution with mean µ and standard deviation σ.U (θ) denotes a uniform distribution.

Probabilistic Formulation Framework
A Bayesian framework is built to estimate unknown variable, s(x), from the posterior P(s(x)|∆ n (x), θ n (x)).For the convenience of notation, the four DFDs have been grouped into a vector valued function ∆ n (x), where The posterior is factorized in a Bayesian fashion as follows where the pixel index x has been excluded for clarity.In Equation (3), s(x) is considered as a random variable.However, the values of l(x) can then be determined from the estimate of s(x) according to Table 1.There are two likelihoods associated with the framework, P(∆ n |s) and P(θ n |s).P(∆ n |s) is the temporal likelihood, which can be computed by the DFDs.P(θ n |s) is the spatial likelihood, which can be obtained by the spatial distribution of orientation of gradient ψ(θ).For the convenience of computation, it is assumed that P(∆ n |s) and P(θ n |s) are statistically independent.Obviously, this posterior probability is determined by the temporal likelihood, the spatial likelihood, and the prior.The temporal likelihood depends on DFDs computation.Under the heuristic of [28], the probabilistic formulation of the temporal likelihood is shown in the following section.The spatial likelihood is formulated with a mixture Gaussian gradient orientation distribution.An EM method is employed for solving model parameters in the spatial probability model.A Gaussian MRF is used as image prior.The detailed introductions of the temporal likelihood, the spatial likelihood and the prior are demonstrated in the following section.

Temporal and Spatial Likelihood
The temporal likelihood is formulated as follows, where ∆ n is DFD after motion compensation, σ 2 e is the variance of the model error, and α acts as a threshold on temporal discontinuities.σ 2 e is determined by estimating the variance of the DFDs when s(x) = 0.For the purpose of clarity, the determination of threshold α is omitted, as more details can be found in [28].
The spatial likelihood P(θ|s) represents the gradient orientation distribution over rain regions.Based on the introduction of Section 2.2, we built a formulation of spatial likelihood P(θ|s), which is given by where µ and σ are unknown random variables (i.e., model parameters of orientation of gradient for rain streaks).Before solving posterior probability P(s|∆ n , θ n ), µ and σ need to be estimated.An expectation maximization (EM) [26] is adopted to estimate model parameters µ and σ.Given a computed gradient angle θ i , the k t h expectation is given by ẑi The maximization step is given by μk where for a given θ, y i samples are adopted.The selection of initial value and testimony of convergence are shown in [25].

Prior P(s)
The prior formulation used in Bayesian framework is The selection of prior model is very import to final results in a Bayesian framework.To maintain spatial and edge consistency, we apply Markov Random Field (MRF) [29], which asserts that the conditional probability of a pixel only depends on its neighbors.In this paper, we use a Gaussian MRF to model P(l(x)), which is characterized by the following local conditional probability density function where the normalized factor Z(i) is given by where N(•) denotes neighborhood pixel centered on pixel x.I(x) − N(x) 2 G σ is the L 2 norm of the difference of I(x) and N(x), weighted against a Gaussian G σ .The parameter h controls the decay of exponential function.In [28], penalty term is introduced into prior expression to improve the accuracy of PM detection.Nevertheless, the incorporation of penalty term cannot produce better results in our case.Therefore, to avoid redundant computation, penalty term has not been adopted.

MAP Solving
An estimate for l(x) is found by finding the MAP estimate of s(x) using the Iterated Conditional Modes (ICM) algorithm [30].The ICM algorithm gives a sub-optimal estimate of s(x).The converged estimate represents a local optimization in the posterior formulation.Importantly, a good initialization of unknown random variables is necessary to ensure that the converged result is close to the global optimization.A multi-resolution scheme is incorporated into the algorithm.Using multi-resolution allows faster convergence for the state field s(x).The final result is more likely to converge to the global maximum.A hierarchical pyramid [31] of differing resolution is conducted (Figure 1).At the bottom level of the pyramid, the resolution was down-sampled by a factor of two in each dimension.The algorithm proceeds by initializing random variables at the coarsest level of the pyramid.An estimate of s(x) at the coarsest level (four levels are used) is obtained from the probabilistic framework, and the new estimate is then used to initializing the framework at level below.This process continues until s(x) has been estimated at full resolution.Notably, before solving l(x), the EM solving described in Equations 6 and 7 need to be finished for solving spatial likelihood in the posterior expression.

Experiments
To justify our proposed algorithm, we compared our method with [25] and [8].In our implementation of [25], the Gaussian mixture model [32] and an approximated histogram of orientation of rain streaks are adopted.The character of neighboring frames of rain steaks is applied in [8]'s implementation.To evaluate the accuracy of the detection algorithm, the test video sequences contain illumination variations, camera motions, moving objects, etc.In our proposed algorithm, temporal and spatial constraints are unified into a maximum a posteriori (MAP) computation.The assumption of temporal domain is pathological motion, and the assumption of spatial domain is consistency of gradient orientation.All constraints are organized into the final estimation of unknown variables.The implemented algorithm was developed with Microsoft Visual Studio 2010 and OpenCV 2.3.The hardware configuration is composed of Intel Core(TM) i5-4200 (1.6 GHz) and 4 GB RAM.The operation system is Windows 7.Under these configurations, the average processing speed of our proposed method is about 8 images per second for 720 × 480 resolution sequences, whereas the methods of [25] and [8] are close to 5 images per second.That is, we get a higher processing speed due to simplified motion estimation and ICM solving.In addition to the benefit of speed, we also show the experimental results of subjective and objective assessments in following paragraph.
Figure 2a is the original video frame.Figure 2e is the detection mask obtained by method [25].Figure 2b is the rain-removal frame using the detection mask of Figure 2e. Figure 2f is the detection mask obtained by method [8]. Figure 2c is the rain-removal frame using the detection mask of Figure 2f. Figure 2g is the detection mask obtained by our proposed method.Figure 2d is the rain-removal frame using the detection mask of Figure 2g.From Figure 2 to Figure 6, it can be seen that our result shows better rationality comparing with [25], [8].Especially, the superior advantage is demonstrated in Figures 3-6.In addition to the subjective test, an objective test is also given to show the comparison of detection accuracy in Figure 7. Figure 7a is a non-rain image.Figure 7b is a synthetic image with added rain by using image editing software.Figure 7c is the ground truth of the detection mask.Figure 7d is our detection mask.Figure 7e and Figure 7f are the results of [25] and [8].
We use the numbers of detected rain pixels to feature detection accuracy which is based on the comparison between ground truth and test results.For our method, 176, 699 pixels are detected as rain, whereas 7366 and 9564 are detected using [25] and [8].Therefore, our proposed algorithm has both higher detection accuracy and higher speed.In this objective test, false detection rate is not been considered, because rain detection is more important than non-rain detection in our image processing application.Totally, these experimental examples showcase the benefits of our algorithm.In fact, image/video spatial and temporal constraints are applied in [25] and [8], respectively.However, spatial distribution and temporal motion of rain streaks are equally important for rain detection.The methods of [25] and [8] built the framework of rain detection based respectively on spatial and temporal analysis.In contrast, we harmonize a temporal constraint and the spatial distribution of gradient of orientation into our Bayesian probabilistic framework.In order to make full use of image self-similarity and maintain image smoothness across or within the edge, as well as to strengthen the correlation of neighboring pixels, a transformed MRF is utilized in the Bayesian framework.Therefore, comparing with previous method, our method enables superior detection by combining spatial and temporal constraints, both subjectively and objectively.