Variational Model for Single-Image Reflection Suppression Based on Multiscale Thresholding

Reflections often cause degradation in image quality for pictures taken through glass medium. Removing the undesired reflections is becoming increasingly important. For human vision, it can produce much more pleasing results for multimedia applications. For machine vision, it can benefit various applications such as image segmentation and classification. Reflection removal is itself a highly illposed inverse problem that is very difficult to solve, especially for a single input image. Existing methods mainly rely on various prior information and assumptions to alleviate the ill-posedness. In this paper, we design a variational model based on multiscale hard thresholding to both effectively and efficiently suppress image reflections. A direct solver using the discrete cosine transform for implementing the proposed variational model is also provided. Both synthetic and real glass images are used in the numerical experiments to compare the performance of the proposed algorithm with other representative algorithms. The experimental results show the superiority of our algorithm over the previous ones.


Introduction
For most multimedia and computer vision applications, the input images are normally assumed to be both clean and clear. However, bad lighting conditions and environments such as photographing behind a window glass or showcase are almost inevitable in our daily life. In these scenarios, the captured images often contain undesired reflections from the objects on the camera side, which severely degrade image quality and affect the performance of multimedia and computer vision applications, such as image segmentation and classification.
Due to the rapid growth of mobile devices, a fast and effective reflection removal algorithm is increasingly desired. Reflection removal is the restoration process that aims essentially at removing or suppressing unpleasing reflections and improving the visibility of the scenes in front of the glass medium. As a preprocessing step, it can benefit various follow-up applications.
Barrow et al. [1] first proposed a linear assumption for an observed degraded image Y as a superposition of two layers as follows: where T and R are the transmission and reflection layers, respectively. The inverse problem of resolving T and R given Y is highly ill-posed, since there is only one equation for two unknowns per pixel. According to the number of input images Y used, reflection removal methods can be classified into multiple-image and single-image approaches, where the former makes the problem less ill-posed while the latter is itself highly ill-posed and challenging. Since perfectly separating the transmission and reflection layers is, in general, very difficult even for the case of multiple input images and, in most situations, people tend to care only about the transmission layers, estimating a reasonable T that might contain

Related Work
As introduced in Section 1, reflection removal is a highly ill-posed inverse problem that is very challenging to solve, especially for the case of a single input image. Arvanitopoulos et al. [2] tried to suppress the reflection instead of removing the reflection. They exploited a Laplacian data fidelity term and a sparse gradient prior, which achieve satisfactory quality for reflection suppression. However, since their model is nonconvex, their algorithms are quite inefficient and require a large number of iterations to converge to a desirable result.
As an improvement of [2], Yang et al. [3] proposed a hard thresholding operation to the gradient of the input image in the Laplacian data fidelity term. Their method also achieves satisfactory quality for reflection suppression, but the algorithm is highly efficient compared with [2]. Since our method is an improvement of [3], we briefly review the works of [2,3] in the following two subsections.

Smooth Regularization on T
The algorithm proposed in [2] for reflection suppression mainly relies on the critical assumption that reflection edges (gradients) are smaller in magnitude and less in focus compared to transmission edges. This assumption is reasonable in real-world scenarios since the camera focuses on transmission objects rather than reflection components. Mathematically, this assumption can be expressed as the following formation model [21]: where • and * denote the component-wise multiplication and the convolution operation, respectively. W is the contribution weight of the transmission T to the observation Y, and k σ is a Gaussian blur kernel with standard deviation σ performed on the reflection layer R. In general, W is pixel-dependent and depends on the lighting conditions and the distance of the camera to the captured scene. For simplicity, weight W can be considered as a constant, i.e., W(x) = w, ∀x ∈ Ω, where Ω and x = (x 1 , x 2 ) denote the image domain and a pixel point, respectively. Following the successful smoothing model of Xu et al. [22], Arvanitopoulos et al. [2] employed the number of non-zero gradients as the prior information to restrict and regularized output transmission. Their model can eliminate small gradients and simultaneously preserve large image edges. The original minimization problem in [22] is as follows: where the following: is the L 2 data fidelity term which keeps the optimal T close to the observation Y, and the following: is the L 0 regularization term where denotes the cardinality measure applied on a given set (i.e., a measure of the number of elements of the set). The L 0 term regularizes the optimal T by implicitly performing a hard thresholding of its gradients with threshold λ. Since high frequency details of the image will be eliminated, model (3) is suitable for image smoothing but unsuitable for reflection suppression.
To preserve high frequency details in optimal T, Arvanitopoulos et al. [2] proposed a Laplacian data fidelity term to replace (4), where the Laplacian of the optimal T is defined by the following.
Their data fidelity term based on the Laplacian to enforce the consistency in fine structures is then given as follows: where regularization term C(T) is the same as defined in (5). The new model can preserve strong edges and details in transmission T while eliminating reflection R simultaneously; thus, it is more suitable for reflection suppression.

Direct Thresholding on Y
In [2], the edge information of the optimal T is preserved by applying the Laplacian differential operator L. The reflection component R is suppressed by applying regularization C on T. As the regularization parameter λ increases, more reflections are suppressed in T. However, as their model defined in (7) is nonconvex, their operator splitting algorithm for minimizing the nonconvex energy functional is highly inefficient and takes a large number of iterations to converge to a desired solution. Therefore, Yang et al. [3] integrated these ideas into a new model formulation, which significantly reduces computation times while retaining satisfactory reflection suppression performances.
Instead of using the hard thresholding operation derived implicitly from the smoothing regularizer C on T, Yang et al. [3] adopted the idea from [23,24] to directly integrate the hard thresholding operation for gradients on Y into the energy functional, which is expressed as follows: min where δ h is the hard thresholding operation for gradients and is defined as follows.
In the first term of (8), the gradient of the observation Y, i.e., ∇Y, is thresholded using δ h before taking the divergence operator div. Since ∇Y(x) with magnitude |∇Y(x)| smaller than h will be truncated to zero, the model has the ability to suppress reflections in T. Since the data term also enforces the consistency of T to Y via Laplacian operator L, fine details of Y will be preserved simultaneously in T. Therefore, the δ h in (8) takes over the role of C in (7) for reflection suppression.
Since the data fidelity term in (8) contains only second-order derivatives of transmission T, the optimal solution of T will not be unique, and any optimal transmission T shifted by a affine function A, i.e., T + A, will also be an optimal transmission. To ensure the uniqueness of the optimal solution, the second term of (8) is incorporated. The model parameter ε should be taken as a positive tiny number so as not to cancel out the effect of reflection suppression, which is performed via the first term.

Proposed Method
In this section, we present an improved version of the model of Yang et al. [3] given in (8) for single-image reflection suppression. The proposed model is based on multiscale hard thresholding, and a direct solver using the discrete cosine transform for implementing the proposed model is also given.

Multiscale Thresholding
The hard thresholding model given in (8) suppresses the reflections by eliminating the gradients of Y at pixel point x with magnitude |∇Y(x)| less than the prescribed threshold value h. For a small h, the model eliminates only small gradients; thus, only weak reflections will be suppressed while strong reflections remain in the optimal transmission. For a large h, the model eliminates both small and large gradients; thus, both weak and strong reflections will be suppressed simultaneously. However, when h is large, many important image edges in the transmission with their gradient magnitudes smaller than h will also be suppressed. Therefore, a large h in (8) usually causes information loss in the optimal transmission. This is the reason why the model should assume that reflection edges are smaller in magnitude and less in focus compared to transmission edges. If this assumption fail to hold, the model will produce either (1) a blurry transmission without any reflections when h is large or (2) a clear transmission but with many reflections remaining when h is small. Therefore, a moderate h is normally picked to trade off between situations (1) and (2) to obtain an acceptable performance.
To suppress both weak and strong reflections while preventing the information loss of important edges in the optimal transmission, we propose a multiscale hard thresholding energy functional to improve the performance of the model of Yang et al. in (8). Beginning with the smallest thresholding scale h, our model additionally considers another N − 1 countable thresholding scales, i.e., {2h, 3h, . . . , Nh}. Hence, there are totally N scales in our model, that is, {h, 2h, 3h, . . . , Nh}, where scale number N is a prescribed positive integer which can be determined by the user. The model in (8) is first modified as follows.
When N = 1, the multiscale energy functional reduces to the energy functional of Yang et al. in (8). When N ≥ 2, the multiscale energy functional has N + 1 terms, and the first N terms are linearly fused into one with uniform weights 1 2 , where each term involves a hard thresholding operation on ∇Y with threshold value nh at n-th scale. Therefore, the first term in (10) can be viewed as a fusion term to integrate all information of the thresholded Laplacian of Y at different scales. As n increases, more and more weak edges are eliminated; thus, only strong edges are left. In other words, the small n term provides the suboptimal transmission T with both strong and weak edges while the large n term provides the suboptimal transmission T with only strong edges. After adding the second term in (10) to ensure the uniqueness of the optimal solution, the final optimal transmission T can be viewed as a linear combination of the sub-optimal transmissions of those subfunctionals corresponding to different thresholding scales. The model parameter ε is also set to be a positive tiny number just as that provided in (8). In this paper, ε will be fixed at 10 −6 .
To further enhance the preserved edge information, we proposed a pixel-dependent weight function ϕ to adapt the thresholded gradients pixelwisely for all subfunctionals in (10) to form our final model for reflection suppression: where the adaptive weight function ϕ is designed to be inversely proportional to |∇Y| as follows.
The role of shifting β in (12) is to maintain a base level of the adaptive weight function. In this paper, β will be fixed at 1.0.

Solving the Model
Since the energy functional proposed in (11) is convex, a transmission T is the optimal solution to (11) if and only if it satisfies the Euler-Lagrange equation derived step by step below: div(ϕδ nh (∇Y))) + εY (13) where L * denotes the adjoint operator of L. Since L is self-adjoint, we have L * = L. Taking the Fourier cosine transform F c on both sides, we have the following.
Denoting K = F c (L) and using component-wise division, we have the following.
Taking the inverse Fourier cosine transform F −1 c , we arrive at the optimal transmission of (11) as follows: where K is the Fourier cosine transform of the Laplacian operator, which can be represented, in its finite dimensional matrix form on an I × J digital image lattice, as follows: and the differential operators L, div and ∇ can be replaced with standard finite difference schemes [25] for digital input images [Y i,j ] for 0 ≤ i ≤ I − 1 and 0 ≤ j ≤ J − 1.
The overall algorithm of the proposed multiscale thresholding method for reflection suppression is summarized in (Algorithm 1), and a toy example in Figure 1 shows the effectiveness of the proposed algorithm.

Algorithm 1:
The proposed algorithm for reflection suppression by multiscale thresholding input : a degraded rgb color image f set : parameters h, N, ε = 10 −6 , β = 1.0 for c in {r, g, b} Y ← f c compute ϕ using (12) compute T using (16) and (17) g c ← T endfor output : the reflection suppression color image g

Numerical Experiment
In this section, we compare the proposed algorithm with the reflection suppression algorithms of Arvanitopoulos et al. [2] and Yang et al. [3], which are all implemented in MATLAB. Both synthetic and real glass images are tested. All experiments are implemented using MATLAB 2019a executed on a Windows PC with Intel Core i7-4790 processor and 16 GB RAM.

Synthetic Glass Image
Two pairs of images of size 512 × 512 were blended into one using Equation (2) to synthesize the test images. The constant weight W i,j ≡ w is set at three different values, which are 0.7, 0.6, and 0.5. The standard deviation σ = 2 is used for the Gaussian blur kernel. The model parameters are fixed at λ = 0.002 for algorithm [2] and h = 0.01, ε = 10 −6 , for both algorithm [3] and the proposed algorithm according to the suggestions in [2,3]. Figure 2a shows two transmission images 'Lena' and 'Clown' and Figure 2b shows two reflection images 'Barbara' and 'House'. The first test image was synthesized by blending 'Lena' with 'Barbara' using constant weight w, and the second test image was synthesized by blending 'Clown' with 'House' using constant weight w. We begin with w = 0.7, which corresponds to a relatively weak reflection effect among the three cases. Figure 3 shows the reflection suppression results of all competing algorithms for the two synthetic glass images. It can be found by observing Figure 3d,k that the proposed algorithm with scale number N = 1 simultaneously possesses very significant image enhancement effects than the two competing algorithms in Figure 3b,c,i,j. As the scale number N increases from 2 to 4 as Figure 3e-g,l-n demonstrates, the proposed algorithm is able to suppress more and more reflections while preserving the edge structures of the transmission images.
Similar results can be found in Figures 4 and 5, which correspond to the relatively moderate (w = 0.6) and strong (w = 0.5) reflection effects, respectively. The proposed algorithm with N = 1 in Figures 4d,k and 5d,k still show significant enhancement effects and the reflections are increasingly suppressed as N increases. To evaluate the competing algorithms quantitatively, we compute the PSNR value for each reflection suppression result using the underlying true transmission image as the reference image. Table 1 shows the PSNR values, and the best value in each row is shown in boldface. It can be seen that the proposed algorithm with N = 2 has the best PSNR values in all cases, and the PSNR values for N = 3 and N = 4 keep very close to that of algorithms [2,3]. This shows that the proposed algorithm is superior to the two representative algorithms. Theoretically, a large N in (11) can suppress more reflections but obtains lower PSNR values since important edges of the transmission are lost in the same time. On the other hand, a small N in (11) can get higher PSNR values but suppressing less reflections. Empirical observations from Table 1 suggest that one can set N = 2 for high PSNR values or set N = 4 for suppressing more reflections. N = 3 is another choice that balances these two extreme choices.
We note that algorithms [2,3] produce very similar image quality and PSNR values. However, this is an expected phenomenon. Although the smooth regularization on T in (7) and the direct thresholding on Y in (8) have quite different functional forms, their core ideas are rather similar, i.e., "to eliminate small image gradients". Model (8) directly eliminates the image gradients of Y, which are less than h in magnitude. For small h, less image gradients are eliminated; for large h, more image gradients are eliminated. On the other hand, model (7) eliminates the image gradients of T in the order of small magnitude to large magnitude, and the number of total eliminated gradients will be determined by model parameter λ. For small λ, less image gradients are eliminated; for large λ, more image gradients are eliminated. Although tuning parameters h and λ can produce very similar results and quality, they are still different since the energy functionals, and (7) and (8) are not equivalent. The two main advantages of model (8) compared to model (7) are as follows: (i) tuning the parameter h is much more intuitive than tuning the parameter λ since h is directly related to gradient magnitude while λ is not; and (ii) the energy functional (8) is convex in T while the energy functional (7) is nonconvex in T, which means that their algorithm will be less efficient.  (a) (a)

Real Glass Image
The real glass dataset of Yang et al. [3] is used here. The test images were captured directly using various smartphones with different resolutions. Figures 6-8 show the reflection suppression results of all the competing algorithms for the six real glass input images.
The 'Dog' image has some reflections on the window glass. Algorithms [2,3] smoothed out some of them, while the proposed algorithm suppressed all reflections and simultaneously enhanced image details. The 'Girl' image has a strong reflection on the left side such that all algorithms can not completely remove it. However, the proposed algorithm with N = 4 suppresses the most reflections among all algorithms. The 'Tree' image has an extremely strong reflection that no algorithm can remove it. However, the proposed algorithm still has the best visual performance by enhancing the image contrast. The 'Office' image has many small rectangle shape reflections that are hard to suppress, the proposed algorithm with N = 4 still suppresses most of them among all algorithms. The 'Child' and 'Library' images have many complex reflections and the proposed algorithm with N = 4 still has the best visual performance. Here, we note that the images 'Office' and 'Library' are relatively challenging among all real examples. The main reason is that the rectangularshaped window reflections have both bright intensities and sharp edges that violate the critical assumption that reflection gradients are smaller in magnitude and lower in focus compared to transmission gradients. If we want to completely remove those reflections, we need to set large h and we will lose many transmission edges as a heavy price. Therefore, the reflections of these two images are relatively hard to suppress. We further note that algorithm [3] and the proposed algorithm share the same model parameters h and ε for all input images.
Since we do not have reflection-free solutions for real glass images for computing PSNR values to quantitatively evaluate the algorithms, here, we compare their computation times instead. Table 2 shows the computation time in seconds for all six input images and all competing algorithms. Although the proposed algorithm costs much time than algorithm [3], it is still very efficient compared to algorithm [2]. Table 2. Computation time of the reflection suppression results in Figure 6. The best value in each row is shown in boldface.  (a) (a)

Ablation Study
Here, we conduct an ablation study for the adaptive weight function ϕ used in the proposed model and provide an experiment of the proposed algorithm with various values of h and β.
We begin with the ablation study by comparing the reflection suppression results of the proposed algorithm with and without the adaptive weight function ϕ, i.e., setting ϕ by (12) and by constant value 1, respectively. Experimental results are provided in Figure 9. From Figure 9b-e, one can observe that, under constant weight ϕ = 1, more and more reflections are suppressed as N increases but the transmission edges also become increasingly blurred. Under the adaptive weight ϕ given in (12), weak transmission edges are enhanced and keep their sharpness as N increases. The results are shown in Figure 9f-i.
To further elaborate the effects of parameters h and β to the proposed model, we test the proposed algorithm under N = 2 and ϕ in (12) with h = 0.025, 0.05, 0.075 and β = 0.25, 0.5, 1. Experimental results are provided in Figure 10. It can be observed that, as the threshold parameter h increases, the strong reflections are increasingly suppressed but the weak transmission edges on the lower building also become increasingly blurred. However, as enhancement parameter β increases, those weak edges become increasingly enhanced. These two experiments demonstrate the role that adaptive weight function ϕ played in the proposed variational model.  (12)) for real glass image 'Building' with various h and β.

Summary and Conclusions
In this paper, we have proposed a both effective and efficient method for suppressing image reflections. Unlike the previous methods that suppressed the reflections only in a single spatial scale, the proposed method can suppress both weak and strong and multiple scale reflections while preserving important edge information via a multiscale hard thresholding framework. A direct solver using the Fourier cosine transform for implementing the proposed method is also given. Experimental results demonstrate that the proposed new algorithm is able to achieve superior performance compared with the previous algorithms. Extending the multiscale idea used in our model for dealing with more challenging situations such as ghosting effect [15,26] should be interesting, and this deserves further studies.

Conflicts of Interest:
The author declares no conflict of interest.