Next Article in Journal
Simulation of Disturbance Recovery Based on MPC and Whole-Body Dynamics Control of Biped Walking
Next Article in Special Issue
Demosaicing of RGBW Color Filter Array Based on Rank Minimization with Colorization Constraint
Previous Article in Journal
Integration of Sentinel-1 and Sentinel-2 Data for Land Cover Mapping Using W-Net
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Joint Demosaicing and Denoising Based on a Variational Deep Image Prior Neural Network

1
Department of Mathematics, Ewha W. University, Seoul 03760, Korea
2
Division of Computer Engineering, Dongseo University, Busan 47011, Korea
3
Institute of Mathematical Sciences, Ewha W. University, Seoul 03760, Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Sensors 2020, 20(10), 2970; https://doi.org/10.3390/s20102970
Submission received: 2 March 2020 / Revised: 18 May 2020 / Accepted: 20 May 2020 / Published: 24 May 2020
(This article belongs to the Special Issue Digital Imaging with Multispectral Filter Array (MSFA) Sensors)

Abstract

:
A joint demosaicing and denoising task refers to the task of simultaneously reconstructing and denoising a color image from a patterned image obtained by a monochrome image sensor with a color filter array. Recently, inspired by the success of deep learning in many image processing tasks, there has been research to apply convolutional neural networks (CNNs) to the task of joint demosaicing and denoising. However, such CNNs need many training data to be trained, and work well only for patterned images which have the same amount of noise they have been trained on. In this paper, we propose a variational deep image prior network for joint demosaicing and denoising which can be trained on a single patterned image and works for patterned images with different levels of noise. We also propose a new RGB color filter array (CFA) which works better with the proposed network than the conventional Bayer CFA. Mathematical justifications of why the variational deep image prior network suits the task of joint demosaicing and denoising are also given, and experimental results verify the performance of the proposed method.

1. Introduction

Nowadays, many digital imaging systems use a single monochrome sensor with color filter array (CFA) to capture a color image. Without color filters, the monochrome camera sensor would give only brightness or luminance information and could not recover the colors of the light that fall on each pixel. To obtain color information, every pixel is covered with a color filter that only lets through a certain color of light: red, green or blue. These sampled red, green and blue channels are then interpolated to fill in the missing information at the pixels for which certain colors could not be sampled. This procedure is called the demosaicing procedure, for which many methods have been proposed [1,2,3,4,5,6,7,8,9].
Current image sensors come in CCD (charge-coupled device) or CMOS (complementary metal–oxide–semiconductor) types, which are both sensitive to thermal noise. Therefore, CFA pattern images taken in low illumination suffer from noise of Poisson distribution. The noise in the CFA pattern image has a large effect on the reconstruction of the color image, as the noise in the noisy pixels spreads out to neighboring regions by the demosaicing process. The denoising is also a challenging task since at least two-thirds of the data are missing. Complex aliasing problems can occur in the demosaicing process if a poor denoising is applied beforehand. As most digital camera pipelines are sequential, quite often the demosaicing and denoising are also performed in an independent and sequential way. This leads to an irreversible error accumulation, since both the demosaicing and the denoising are ill-posed problems and the error occurring in one of the procedures cannot be undone in the other procedure. It has been shown that simultaneous coping with the errors in both the denoising and the demosaicing has advantages, and some joint demosaicing and denoising methods based on optimization techniques have been developed [10,11,12]. Recently, inspired by the success of the convolutional neural network (CNN) in many image processing tasks, methods which use the CNN in joint demosaicing and denoising have been proposed [13,14,15]. The work of [13] is a first attempt to apply a learning approach for demosaicing, and the work in [14] is a first attempt to use a convolutional network for joint demosaicing and denoising, but works only on a single noise level. The work in [15] exposes a runtime parameter and trains the network so that it adapts to a wider range of noise levels, but still can only work with a relatively low level of noise. The work of [16] proposes a residue learning neural network structure for the joint demosaicing and denoising problem based on the analysis of the problem using sparsity models, and that of [17] presents a method to learn demosaicing directly from mosaiced images without requiring ground truth RGB data, and showed that a specific burst improves the fine-tuning of the network. Furthermore, the work in [18] proposes a demosaicing network which can be described as an iterative process, and proposes a principled way to design a denoising network architecture. Such CNN-based methods need a lot of data to be trained, and normally, work poorly with varying noise.
In this paper, we propose a deep image prior based method which needs only the noisy image as the training data for the demosaicing. The proposed method uses as the input a sum of a constant and varying noise. We give mathematical justifications as to why the added varying input noise results in the denoising of the demosaiced image. Furthermore, we propose a color filter array which suits the proposed demosaicing method and show experimentally that the proposed method yields good joint demosaicing and denoising results.

2. Related Works

The following works are related to the proposed method. The proposed method can be seen as a variation of the following works fitted to the joint demosaicing and denoising problem.

2.1. Deep Image Prior

Recently, in [19], a deep image prior (DIP) has been proposed for image restoration. The DIP is a type of convolutional neural network which resembles an auto encoder, but which is trained with a single image x 0 ; i.e., only with the image to be restored. The original DIP converts a 3D tensor z into a restored image f θ ( z ) , where f θ ( · ) denotes the deep image prior network with parameters θ . The tensor z is filled with random noise from a uniform distribution. The DIP can be trained to inpaint an image with a loss function as follows:
L = D m f θ ( z ) , m x 0 ,
where m { 0 , 1 } H × W is a binary mask with values of zero corresponding to the missing pixels to be inpainted and values of one corresponding to the existing pixels which have to be kept, and ⊙ is an element-wise multiplication operator. Here, D is a distance measure, which is normally set as the square of the L 2 difference operator; i.e., D ( a , b ) = a b 2 2 . The minimization of L in Equation (1) with respect to the parameters θ of the DIP network has been shown to be capable of inpainting an image; i.e., the minimization results in an inpainted image f θ ( z ) . Inpainting and demosaicing are similar in that they try to fill in missing pixels. The difference is that in inpainting the existing pixels have full channel information, i.e., all R, G and B values are available, whereas in the demosaicing the existing pixels have only one of the R, G and B values.

2.2. Variational Auto Encoder

The variational auto-encoder [20] is a stochastic spin of the auto-encoder which consists of an encoder q θ ( z x ) and a decoder p ϕ ( x z ) , where both the encoder and decoder are neural networks with parameters θ and ϕ , respectively. Given an image x as the input, the encoder q θ ( z x ) outputs parameters to a Gaussian probability distribution. After that, samples are drawn from this distribution to get a noise input z to the decoder p ϕ ( x z ) . The space from which z is sampled is stochastic and of lower dimension than the space of x . By sampling different samples each time, the variational auto-encoder learns to generate different images. The proposed method also samples noise from a Gaussian distribution, but, unlike the variational auto-encoder, the noise is constituted of constant noise and varying noise and is an input into the network and not used as an intermediate input stage to the decoder.

3. Variational Deep Image Prior for Joint Demosaicing and Denoising

In this section, we propose a variational deep image prior (DIP) for joint demosaicing and denoising. We denote by f θ the DIP network, and use the same network structure as the DIP for inpainting as defined in [19]; i.e., a U-Net type network which is downsampled five times and upsampled five times. The loss function for the variational DIP differs from that of the original DIP as follows:
L = m r ( f θ ( z c ) y k ) 2 ] for   k < P m r ( f θ ( z c + z v ) y k ) 2 ] for   k P .
Here, z c and z v denote the constant noise and the varying noise, respectively, both derived from a Gaussian distribution, and m r is the binary mask corresponding to the proposed random CFA; i.e., it consists of three channels. Each channel constitutes of 33% of random pixels with the value one and 66% of random pixels having the value zero. Unlike the inpainting problem, the positions of the pixels having the value one are different for each channel. The input to f θ is a constant noise ( z c ) until the ( P 1 ) -th training step. Then, after the ( P 1 ) -th training step, the input becomes the sum of a constant noise ( z c ) and a varying noise ( z v ), where the noise z v is newly generated and differs for each training step. The effect of adding this varying noise z v will be explained later. The target image y k also differs for the different iteration steps:
y k = x 0 for   k < P argmin y [ ( 1 α β ) m r ( y y k 1 ) 2 + α m r ( y x 0 ) 2 + β m r ( y f θ ( z c ) ) 2 ] for   k P
For the steps k < P , y k = x 0 , x 0 is a three channel image, wherein each channel contains 33% of either the R, G or B intensity values at the pixel positions where the R, G and B values are sensed by the random CFA, respectively, and 66% of zero values at the remaining positions. Furthermore, we assume that x 0 contains noise. Therefore, if y k = x 0 for all steps k, then the reconstructed image f θ will converge to a noisy demosaiced image. To avoid this, after the step k = P , the target image y k becomes a weighted average of the previous target image y k 1 , the given noisy image x 0 and f θ ( z c ) . The weights between these images are controlled by α and β . In the experiments, we let α = 0.003 and β = 0.007 for all images. It should be noted that f θ ( z c ) differs from the current output f θ ( z c + z v ) . The image f θ ( z c ) is a denoised version of y k 1 , and the adding of it denoises the target image y k . The adding of x 0 has the effect of adding back the noise, a trick which is widely used in denoising algorithms to restore the fine details back to the target image. The adding of the previous target image y k 1 keeps the record of the denoised target image. By using the improved target image y k , the network is not trained toward the given noisy image x 0 any longer, which results in a better denoised output image. Figure 1 shows the working flow of the proposed method. Here, the orange bullets and the solid lines refer to the neural network used in the computation with the specific parameters of θ 0 , θ 1 , θ 2 , , and the dashed lines refer to the inputs to the network. Again, it can be observed that z c remains constant, while z v P , z v P + 1 , are changing over the time.
Next, we give mathematical justifications on the joint demosaicing and denoising property of the variational DIP. First, the reason which explains why the DIP performs demosaicing can be found in [21], where it is proven that an auto-encoder with an insufficient number of input channels performs an approximation of the target signal under the Hankel structured low-rank constraint:
min f R n f * f 2 subject to RANK ( H d | p ( f ) ) r < p d
where H d | p ( f ) denotes the Hankel matrix of f with p input channels and a convolution filter of size d, and f * is the target signal. The above approximation can be easily extended to the two-dimensional case. Thus, letting f = f θ ( z c ) and f * = y k for the two-dimensional case, we see that we get a low rank approximation of y k . It has been already shown in [22], that a low rank approximation can perform a reconstruction of missing pixels. When applied to the CFA patterned image, this results in a demosaicing. The Hankel structured low-rank approximation in Equation (4) performs a better approximation than the method in [22], since in [22] the low rank approximation is with respect to the Fourier basis, whereas in Equation (4) this is with respect to a learned basis which best reconstructs the given image, and therefore, yields a better demosaicing result.
Now, to consider the effect of adding the varying noise z v to the constant noise z c , we consider a multi-variable vector-valued function f θ ( z ) which can be expressed as a set of M multi-variable scalar-valued function f θ i ( z ) :
f θ ( z ) = f θ 1 ( z ) , , f θ M ( z ) T .
The Taylor expansion of the i-th ( i = 1 , , M ) component is
f θ i ( z + δ z ) = f θ i ( z ) + g T δ z + 1 2 δ z T H δ z + ϵ ( δ z 3 )
where δ z = δ z 1 , , δ z N T is a vector. The gradient vector g and the Hessian matrix H are the first and second order derivatives of the function f θ i ( z ) defined as:
g = g f θ i ( z ) = f θ i ( z ) = d d z f θ i ( z ) = f θ i z 1 f θ i z N
H = H f θ i ( z ) = d d z g f θ i ( z ) = 2 f θ i 2 z 1 2 f θ i z 1 z N 2 f θ i z N z 1 2 f θ i 2 z N .
We can extend Equation (5) to f θ ( z ) as a vector form. The first two terms can be written as
f θ ( z + δ z ) f θ 1 ( z ) f θ M ( z ) + f θ 1 z 1 f θ 1 z N f θ M z 1 f θ M z N δ z 1 δ z N = f θ ( z ) + J f θ ( z ) δ z
where J f θ ( z ) is the Jacobian matrix defined over f θ ( z ) . The second order term requires a tensor form to be expressed which is difficult to write in vector form, and therefore, we replace the second order term with the error term ϵ ( z v 2 ) . Then, we can express f θ ( z c + z v ) by the Taylor expansion
f θ ( z c + z v ) = f θ ( z c ) + J f θ ( z c ) z v + ϵ ( z v 2 ) .
This results in
f θ ( z c + z v 2 ) f θ ( z c + z v 1 ) = | J f θ ( z c ) ( z v 2 z v 1 ) + ϵ ( z v 2 2 ) ϵ ( z v 1 2 ) | = M z v 2 z v 1 2
where
M = J f θ ( z c ) ( z v 2 z v 1 ) z v 2 z v 1 2 + ϵ ( z v 2 2 ) z v 2 z v 1 2 ϵ ( z v 1 2 ) z v 2 z v 1 2
and J f θ ( z ) is the Jacobian matrix defined over the vector function f θ ( z ) :
J f θ ( z ) = d d z f θ ( z )
Equation (6) implies the fact that if M 0 and z v 2 z v 1 , then f θ ( z c + z v 2 ) f θ ( z c + z v 1 ) ; i.e., the outputs f θ ( z c + z v 2 ) and f θ ( z c + z v 1 ) cannot be the same. This is contradictory to the loss function in Equation (2), which minimization forces the outputs f θ ( z c + z v k ) to converge to the same image y 0 for all different inputs z v k , k P . It should be noted that, with very high probability, M 0 and z v 1 z v 2 , since z v 1 and z v 2 are random noises. Therefore, the different inputs of z v k act as regularizers which eliminate the components with small L 2 norm energy from f θ ( z c + z v k ) . As the components with small energy will be mostly the noise, this will result in a noise removal of f θ ( z c + z v k ) .
Furthermore, if we take the expectation of the different outputs f θ ( z c + z v k ) with respect to z v k , we get
E z v k { f θ ( z c + z v k ) } E z v k { f θ ( z c ) } + E z v k { J f θ ( z c ) z v k } = E z v k { f θ ( z c ) } + J f θ ( z c ) E z v k { z v k } = E z v k { f θ ( z c ) } = f θ ( z c ) ,
which shows the fact that if we put z c as the input after the DIP has been trained, the output f θ ( z c ) will be approximately the average of the outputs f θ ( z c + z v k ) for different z v k . This averaging has a further denoising effect which will remove the remaining noise.
The fact that different inputs of z v k result in different outputs can also be shown by the mean value theorem. According to the mean value theorem, there always exists a point z ˜ between z v 1 and z v 2 such that the following equality holds:
f θ ( z c + z v 1 ) f θ ( z c + z v 2 ) = z f θ ( z ˜ ) · ( z v 1 z v 2 ) .
When z v 1 z v 2 , then the right-hand side of Equation (9) is, with very high probability, non-zero, since it is almost unlikely that z f θ ( z ˜ ) and ( z v 1 z v 2 ) are orthogonal to each other. Therefore, with very high probability,
| f θ ( z c + z v 1 ) f θ ( z c + z v 2 ) | = | z f θ ( z ˜ ) | | z v 1 z v 2 | c o s i n e ( γ ) 0 ,
where γ is the angle between z f θ ( z ˜ ) and z v 1 z v 2 . This means that if there is a difference between the inputs, then the outputs of the DIP cannot be the same, so there will be an averaging which removes the noise.
Next, we propose a CFA pattern, which we think works well with the proposed demosaicing method. The proposed CFA consists of randomly distributed pixels, where the pixels corresponding to the R, G and B channels take up 33% of the whole CFA pattern each. The design of the proposed CFA is not based on a rigorous analysis, as done in classical CFA designs [23,24,25], but on simple reasoning and experimental results. We reason that if the filters are to learn to generate the R, G and B pixels without any bias for a specific color or a specific position, the best training method would be to train the filters to generate any random color at any random position. For example, if the CFA pattern has, for example, 50% green pixels, as in the Bayer format, the convolutional filters will be trained mainly how to generate the green pixels from the noise. When trained like this, the same convolutional filters may be less effective in generating the R or B pixels. Therefore, we reason that the amount of information should be the same for all three channels; i.e., the CFA should consist of 33% of R, G and B pixels each. In the same manner, we reason that if the filters are to learn to generate the R, G and B pixels without any bias for a specific position, it would be good to train the filters to generate any random color at any random position, which is why we propose a pattern with randomly distributed color pixels. Experimental results show that the randomly patterned CFA works better with the proposed demosaicing method than the Bayer pattern or the Fuji X-Trans pattern. Figure 2 shows the different color filter arrays (CFAs) which are used in the experiments including the proposed CFA.

4. Experimental Results

We compared the proposed method with other deep-learning-based demosaicing methods on three different datasets. We added different noises generated from Gaussian distributions with different standard deviations of σ R , σ G and σ B , for the R, G and B channels, respectively. This is due to the fact that the R, G and B filters absorb different light energy. We compared the proposed method with the method in [28] as a representative of the non-deep learning demosaicing method; the sequential energy minimization (SEM) method [14]; the DemosaicNet [15] with two different CFAs, i.e., the DemosaicNet with Bayer CFA(DNetB) and the DemosaicNet with the Fuji X-Trans CFA(DNetX); and the plain DIP [19]. We made quantitative comparisons with the PSNR (peak signal-to-noise ratio), the CPSNR (color peak signal-to-noise ratio), the SSIM (structural similarity index), the FSIM (feature similarity index) and the FSIMc (feature similarity index chrominance) measures and summarized the results in Tables 2–4. The values in the tables are the average values for the Kodak images and the McMaster (also known as the IMAX) images, respectively. Furthermore, the values corresponding to the red, green and blue channels are the average values for those particular channels, respectively. The parameters for α and β in Equation (3) were set to 0.003 and 0.007, respectively, throughout all the experiments.
Table 1 shows the results of performance comparison when the proposed method was applied for the different CFA patterns; i.e, the Bayer [26], the Fuji X-Trans [27], the Lucak [23], the Hirakawa [25] and the proposed CFA patterns with different noise levels. For the Hirakawa CFA we used the pattern-A and pattern-B patterns which have RGB ratios of 1:1:1 and 1:2:1, respectively. Likewise, for the proposed random patterned CFA, we used the Random1 pattern (RGB ratio of 1:1:1) and the Random2 pattern (RGB ratio of 1:2:1). The CPSNR, SSIM and FSIMc values are the average values of the images in the Kodak image dataset. When the noise is low, the Hirakawa pattern-A CFA shows the largest CPSNR and SSIM values. However, when the noise increases, the proposed random pattern shows larger PSNR, SSIM and FSIM values. This is maybe due to the fact that when the noise increases, the tasks of demosaicing and denoising become similar—i.e., the task of removing the random noise and the task of filling in random colors become similar—so the finding of the parameters which do the demosaicing and denoising tasks simultaneously becomes an easier task with the proposed random CFA than with other CFAs. It can be seen that the proposed CFA pattern mostly shows the largest value, especially when the noise is large.
Figure 3 shows the results of the different demosaicing methods on the first dataset with color noise of standard deviations σ R = 9.83 , σ G = 6.24 and σ B = 6.84 for the R, G and B channels, respectively. The parameter P in Equation (3) is set to 1200 for the experiments with this color noise, and to 500 for all the other experiments. When the noise is light, the SEM, the DNetB and the DNetX also produce good joint demosaicing and denoising results. The SEM shows the best quantitative results in the PSNR values for the Kodak dataset, as can be seen in Table 2. However, the proposed method achieves the best results in the SSIM and the FSIM measures for all datasets, and the best PSNR values for the McMaster dataset. Figure 4 and Figure 5 and Table 2, Table 3 and Table 4 show the results on the dataset with color noise of standard deviations σ R = 19.67 , σ G = 12.48 and σ B = 13.67 . The ADMM method got the highest value of cPSNR for the McMaster dataset; that was due to the fact that the ADMM method incorporates the powerful BM3D denoising [29] and the total variation minimization into a single framework, which results in a large denoising power. Therefore, we experimented also on a combination of the proposed method and the BM3D. In this case, the proposed training method can focus more on finding the parameters for the demosaicing task, leaving a large part of denoising to the BM3D, which results in finding effective parameters for demosaicing. The results of the SEM, DNetB and DNetX are those without using the external BM3D denoising method. The proposed + BM3D outperforms the other methods on the Kodak dataset with respect to the PSNR and SSIM measures. As the noise increases, the ADMM, SEM, DNetB and DNetX result in severe color artifacts, as can be observed from the fence regions in the enlarged images in Figure 5b–e. However, the DIP and the proposed method overcome such color artifacts due to the inherent rank minimization property. The figures are selected according to the best PSNR values, which is why the figures for the DIP are a little more blurry than the figures for the proposed method. The DIP reconstructs the noise when reconstructing the high frequency components while the proposed method does not. Finally, Figure 6 and Figure 7 and Table 2, Table 3 and Table 4 show the results on the dataset with color noise of standard deviations σ R = 26.22 , σ G = 16.64 and σ B = 18.23 . For this dataset, the non-deep-learning ADMM method outperforms all the deep-learning-based methods, including the proposed method, in the quantitative measures. However, the proposed method outperforms all other deep-learning-based methods. Furthermore, while the ADMM shows large aliasing artifacts, as can be seen in Figure 8b, the proposed method is free from such artifacts. Again, it should be taken into account that this is the result of training with the noisy pattern CFA image only. Furthermore, we fixed all the hyper-parameters of the network for all the different noise levels, which means that the proposed method is not sensitive to the noise levels.
Figure 9 compares the convergence of the PSNR values according to the training iterations of the plain DIP and the proposed variational DIP, respectively. As can be seen, the plain DIP converges to a lower PSNR value as the training step iterates, which is due to the fact that the noise in the target image is reconstructed. In comparison, with the proposed variational DIP, the noise is not reconstructed, due to the reasons explained in the previous section. Therefore, the final output image converges to a joint demosaiced and denoised image, which results in a convergence to a higher PSNR value.
Table 5 shows the computational time costs of the different methods. All the methods have been run on a PC with an Intel Core i9-9900K Processor, NVIDIA GeForce RTX 2080 Ti and 32 GB RAM. The proposed method is the slowest of all the methods, which is due to the fact that the proposed method uses a training step for each incoming CFA image. The computational time can be reduced if the proposed method is combined with the meta learning approach. One of the possible methods would be to initialize the neural network with good initial parameters obtained by some pre-training with many images. This should be one of the major topics of further studies.

5. Conclusions

In this paper, we proposed a variational deep image prior for the joint demosaicing and denoising of the proposed random color filter array. We mathematically explained why the variational model results in a demosaicing and denoising result, and experimentally verified the performance of the proposed method. The experimental results showed that the proposed method is superior to other deep-learning-based methods, including the deep image prior network. How to apply the proposed method on the demosaicing of color filter arrays including channels other than the three primary color channels could be the topic of further studies.

Author Contributions

Conceptualization, Y.P., S.L. and J.Y.; methodology, Y.P. and B.J.; software, Y.P. and S.L.; validation, B.J. and J.Y.; formal analysis, S.L. and J.Y.; investigation, J.Y.; resources, Y.P. and B.J.; writing—original draft preparation, Y.P. and S.L.; writing—review and editing, B.J. and J.Y.; visualization, Y.P.; supervision, J.Y.; project administration, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

The work of J.Y. was supported in part by the National Research Foundation of Korea under grant NRF-2020R1A2C1A01005894 and the work of S.L. was supported by the Basic Science Research Program through the National Research Foundation of Korea under grant NRF-2019R1I1A3A01060150.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pekkucuksen, I.; Altunbasak, Y. Multiscale gradients-based color filter array interpolation. IEEE Trans. Image Process. 2013, 22, 157–165. [Google Scholar] [CrossRef] [PubMed]
  2. Kiku, D.; Monno, Y.; Tanaka, M.; Okutomi, M. Beyond color difference: Residual interpolation for color image demosaicking. IEEE Trans. Image Process. 2016, 25, 1288–1300. [Google Scholar] [CrossRef] [PubMed]
  3. Gunturk, B.K.; Altunbasak, Y.; Mersereau, R.M. Color plane interpolation using alternating projections. IEEE Trans. Image Process. 2002, 11, 997–1013. [Google Scholar] [CrossRef] [PubMed]
  4. Alleysson, D.; Süsstrunk, S.; Hérault, J. Linear demosaicing inspired by the human visual system. IEEE Trans. Image Process. 2005, 14, 439–449. [Google Scholar] [PubMed] [Green Version]
  5. Gunturk, B.K.; Glotzbach, J.; Altunbasak, Y.; Schafer, R.W.; Mersereau, R.M. Demosaicking: Color filter array interpolation. IEEE Signal Process. Mag. 2005, 22, 44–54. [Google Scholar] [CrossRef]
  6. Kimmel, R. Demosaicing: Image reconstruction from color ccd samples. IEEE Trans. Image Process. 1999, 8, 1221–1228. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Pei, S.-C.; Tam, I.-K. Effective color interpolation in ccd color filter arrays using signal correlation. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 503–513. [Google Scholar]
  8. Menon, D.; Calvagno, G. Color image demosaicking: An overview. Signal Process Image 2011, 26, 518–533. [Google Scholar] [CrossRef]
  9. Dubois, E. Frequency-domain methods for demosaicking of bayer-sampled color images. IEEE Signal Process. Lett. 2005, 12, 847–850. [Google Scholar] [CrossRef]
  10. Hirakawa, K.; Parks, T.W. Joint demosaicing and denoising. IEEE Trans. Image Process. 2006, 15, 2146–2157. [Google Scholar] [CrossRef] [PubMed]
  11. Jeon, G.; Dubois, E. Demosaicking of noisy bayer sampled color images with least-squares luma-chroma demultiplexing and noise level estimation. IEEE Trans. Image Process. 2013, 22, 146–156. [Google Scholar] [CrossRef] [PubMed]
  12. Buades, A.; Duran, J. CFA Video Denoising and Demosaicking Chain via Spatio-Temporal Patch-Based Filtering. IEEE Trans. Circuits Syst. Video Technol. 2019. [Google Scholar] [CrossRef]
  13. Khashabi, D.; Nowozin, S.; Jancsary, J.; Fitzgibbon, A.W. Joint demosaicing and denoising via learned nonparametric random fields. IEEE Trans. Image Process. 2014, 23, 4968–4981. [Google Scholar] [CrossRef] [PubMed]
  14. Klatzer, T.; Hammernik, K.; Knobelreiter, P.; Pock, T. Learning joint demosaicing and denoising based on sequential energy minimization. In Proceedings of the 2016 IEEE International Conference on Computational Photography (ICCP), Evanston, IL, USA, 13–15 May 2016; pp. 1–11. [Google Scholar]
  15. Gharbi, M.; Chaurasia, G.; Paris, S.; Durand, F. Deep Joint Demosaicking and Denoising. ACM Trans. Graph. (TOG) 2016, 35, 1–12. [Google Scholar] [CrossRef]
  16. Huang, T.; Wu, F.; Dong, W.; Guangming, S.; Li, X. Lightweight Deep Residue Learning for Joint Color Image Demosaicking and Denoising. In Proceedings of the 2018 International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 127–132. [Google Scholar]
  17. Ehret, T.; Davy, A.; Arias, P.; Facciolo, G. Joint Demosaicking and Denoising by Fine-Tuning of Bursts of Raw Images. In Proceedings of the 2019 International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 8868–8877. [Google Scholar]
  18. Kokkinos, F.; Lefkimmiatis, S. Iterative Joint Image Demosaicking and Denoising Using a Residual Denoising Network. IEEE Trans. Image Process. 2019, 28, 4177–4188. [Google Scholar] [CrossRef] [PubMed]
  19. Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep Image Prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 9446–9454. [Google Scholar]
  20. Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations(ICLR 2014), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  21. Ye, J.C.; Han, Y.S. Deep Convolutional Framelets: A General Deep Learning for Inverse Problems. SIAM J. Imaging Sci. 2017, 11, 991–1048. [Google Scholar] [CrossRef]
  22. Zhou, J.; Kwan, C.; Ayhan, B. A High Performance Missing Pixel Reconstruction Algorithm for Hyperspectral Images. In Proceedings of the 2nd International Conference on Applied and Theoretical Information Systems, Taipei, Taiwan, 10–12 February 2012; pp. 1–10. [Google Scholar]
  23. Lukac, R.; Konstantinos, N.P. Color Filter Arrays: Design and Performance Analysis. IEEE Trans. Consum. Electron. 2005, 51, 1260–1267. [Google Scholar] [CrossRef] [Green Version]
  24. Vaughn, I.J.; Alenin, A.S.; Tyo, J.S. Focal plane filter array engineering I: Rectangular lattices. Opt. Express 2017, 25, 11954–11968. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  25. Hirakawa, K.; Wolfe, P.J. Spatio-Spectral Color Filter Array Design for Optimal Image Recovery. IEEE Trans. Image Process. 2008, 17, 1876–1890. [Google Scholar] [CrossRef] [PubMed]
  26. Bayer, B. Color Imaging Array. U.S. Patent 3,971,065, 20 July 1976. [Google Scholar]
  27. Fujifilm X-Pro1. Available online: http://www.fujifilmusa.com/products/digital_cameras/x/fujifilm_x_pro1/features (accessed on 23 May 2020).
  28. Tan, H.; Zeng, X.; Lai, S.; Liu, Y.; Zhang, M. Joint demosaicing and denoising of noisy bayer images with ADMM. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP 2017), Beijing, China, 17–20 September 2017; pp. 2951–2955. [Google Scholar]
  29. Kostadin, D.; Alessandro, F.; Vladimir, K.; Karen, E. Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar]
Figure 1. Diagram of the proposed method. The variational noise z v P , z v P + 1 , is added after the (P-1)-th iteration.
Figure 1. Diagram of the proposed method. The variational noise z v P , z v P + 1 , is added after the (P-1)-th iteration.
Sensors 20 02970 g001
Figure 2. Different color filter arrays (CFAs): (a) Bayer CFA [26]; (b) Fuji X-Trans CFA [27]; (c) Lukac [23]; (d) Hirakawa Pattern-A [25]; (e) Hirakawa Pattern-B [25]; (f) proposed CFA (1:1:1); (g) proposed CFA (1:2:1).
Figure 2. Different color filter arrays (CFAs): (a) Bayer CFA [26]; (b) Fuji X-Trans CFA [27]; (c) Lukac [23]; (d) Hirakawa Pattern-A [25]; (e) Hirakawa Pattern-B [25]; (f) proposed CFA (1:1:1); (g) proposed CFA (1:2:1).
Sensors 20 02970 g002
Figure 3. Reconstruction results for the Kodak number 19 image with noise levels σ R = 9.83 , σ G = 6.24 and σ B = 6.84 . (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Figure 3. Reconstruction results for the Kodak number 19 image with noise levels σ R = 9.83 , σ G = 6.24 and σ B = 6.84 . (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Sensors 20 02970 g003
Figure 4. Reconstruction results for the Kodak number 19 image with noise levels σ R = 19.67 , σ G = 12.48 and σ B = 13.67 . (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Figure 4. Reconstruction results for the Kodak number 19 image with noise levels σ R = 19.67 , σ G = 12.48 and σ B = 13.67 . (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Sensors 20 02970 g004
Figure 5. Enlarged regions of Figure 4. (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Figure 5. Enlarged regions of Figure 4. (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Sensors 20 02970 g005
Figure 6. Reconstruction results for the Kodak number 5 image with noise levels σ R = 26.22 , σ G = 16.64 and σ B = 18.23 . (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Figure 6. Reconstruction results for the Kodak number 5 image with noise levels σ R = 26.22 , σ G = 16.64 and σ B = 18.23 . (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Sensors 20 02970 g006
Figure 7. Enlarged regions of Figure 6. (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Figure 7. Enlarged regions of Figure 6. (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Sensors 20 02970 g007
Figure 8. Enlarged regions of the denoising results on the Kodak number 19 image with noise levels σ R = 26.22 , σ G = 16.64 and σ B = 18.23 . (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Figure 8. Enlarged regions of the denoising results on the Kodak number 19 image with noise levels σ R = 26.22 , σ G = 16.64 and σ B = 18.23 . (a) Original, (b) ADMM, [28] (c) SEM, [14] (d) DNetB, [15] (e) DNetX, [15] (f) DIP, [19] (g) proposed and (h) proposed + BM3D.
Sensors 20 02970 g008
Figure 9. Comparison of the convergence between the DIP and the proposed variational DIP.
Figure 9. Comparison of the convergence between the DIP and the proposed variational DIP.
Sensors 20 02970 g009
Table 1. Comparison of the CPSNR, SSIM and FSIMc values for the different CFA patterns used with the proposed method on the Kodak image dataset.
Table 1. Comparison of the CPSNR, SSIM and FSIMc values for the different CFA patterns used with the proposed method on the Kodak image dataset.
Noise LevelMeasureBayerXtransLukacHirakawaAHirakawaBRandom1Random2
CPSNR31.438632.143531.865032.501832.055232.191832.1410
σ R = 9.83SSIM-R0.85430.87190.85960.88270.87500.87180.8709
σ G = 6.24SSIM-G0.88700.87690.88390.89720.88230.88610.8863
σ B = 6.84SSIM-B0.85300.87440.85900.88540.88630.87190.8749
FSIMc0.97710.97890.97590.93680.97400.97750.9775
CPSNR29.258229.357229.405729.713229.558429.391429.3710
σ R = 19.67SSIM-R0.77240.77190.77450.82570.82750.77500.7927
σ G = 12.48SSIM-G0.82170.82070.82440.84520.84020.82300.8206
σ B = 13.67SSIM-B0.79480.79920.79920.83280.82390.80280.8084
FSIMc0.95370.95410.95480.95320.95160.95480.9548
CPSNR28.003728.034028.073327.903027.589527.979928.2210
σ R = 26.22SSIM-R0.71840.71260.71800.76820.76310.71270.7013
σ G = 16.64SSIM-G0.78010.77430.77820.76820.77850.77510.7809
σ B = 18.23SSIM-B0.75560.75170.75430.76450.75960.75420.7670
FSIMc0.93640.93540.93580.94150.96160.93540.9372
Table 2. Comparison of the PSNR values among the various demosaicing methods on the Kodak and the McMaster image datasets.
Table 2. Comparison of the PSNR values among the various demosaicing methods on the Kodak and the McMaster image datasets.
Noise LevelDatasetMeasureADMMSEMDNetBDNetXDIPProposedProposed +BM3D
KodakcPSNR31.337032.546831.405331.376630.451232.141032.6103
PSNR-R30.919631.635130.303730.476629.548731.540632.2743
σ R = 9.83 PSNR-G31.955333.266432.183832.091731.166332.577932.8726
σ G = 6.24 PSNR-B31.272232.973132.013131.762830.985232.407932.7272
σ B = 6.84McMastercPSNR32.125830.790831.117131.015529.649032.660333.0659
PSNR-R31.845829.747229.986829.986428.722132.260532.9728
PSNR-G33.216332.175032.255832.136630.165133.181333.4180
PSNR-B31.666130.911431.519531.286130.417932.823933.0647
KodakcPSNR30.188326.168225.926625.863627.200129.371030.2026
PSNR-R29.543025.148524.592824.949726.178728.726129.6902
σ R = 19.67 PSNR-G30.772826.477626.638526.246327.879129.958730.4681
σ G = 12.48 PSNR-B30.390427.177226.960326.588927.900329.948330.5204
σ B = 13.67McMastercPSNR30.579425.728026.123426.032126.735629.798430.4533
PSNR-R29.934424.415424.648224.872125.541928.940129.9461
PSNR-G31.587926.572727.070526.661727.346430.484030.9327
PSNR-B30.540226.681427.207926.907927.815930.337530.7383
KodakcPSNR29.324723.284323.497623.472425.657028.221029.0672
PSNR-R28.556022.313122.049122.552924.734127.358828.4363
σ R = 26.22 PSNR-G29.942423.535224.244823.784726.136528.761629.3628
σ G = 16.64 PSNR-B29.640324.301024.693824.283626.283928.803129.5067
σ B = 18.23McMastercPSNR29.531223.255423.879923.780725.478528.205029.0961
PSNR-R28.671621.894422.289622.601324.293027.130528.4053
PSNR-G30.593823.994824.818024.295826.082329.062129.6725
PSNR-B29.690124.393125.180124.803426.522428.934029.4845
Table 3. Comparison of the SSIM values among the various demosaicing methods on the Kodak and the McMaster image datasets.
Table 3. Comparison of the SSIM values among the various demosaicing methods on the Kodak and the McMaster image datasets.
Noise LevelDatasetMeasureADMMSEMDNetBDNetXDIPProposedProposed +BM3D
KodakSSIM-R0.86130.85810.77740.78340.79770.87090.8871
σ R = 9.83 SSIM-G0.88400.87850.82670.81540.86250.88630.8891
σ G = 6.24 SSIM-B0.85240.88020.82880.82090.84690.87490.8794
σ B = 6.84McMasterSSIM-R0.88460.80160.76120.76620.80170.88050.8993
SSIM-G0.91320.85770.82220.81340.86570.90130.9074
SSIM-B0.86240.81920.80680.80000.84070.87580.8822
KodakSSIM-R0.82530.56300.53720.54050.64900.79270.8300
σ R = 19.67 SSIM-G0.85350.60070.60870.57620.75010.82060.8372
σ G = 12.48 SSIM-B0.82640.62560.62180.59310.73200.80840.8270
σ B = 13.67McMasterSSIM-R0.84160.52580.53200.53850.65290.81040.8378
SSIM-G0.88170.60830.61850.59190.77250.84670.8588
SSIM-B0.82880.59410.62090.60070.74850.82410.8221
KodakSSIM-R0.79720.42960.42400.43190.58810.70130.7948
σ R = 26.22 SSIM-G0.83150.46420.49720.46570.68750.78090.8074
σ G = 16.64 SSIM-B0.80580.48580.50930.48190.67060.76700.7978
σ B = 18.23McMasterSSIM-R0.80910.40660.42890.43640.59150.74630.7961
SSIM-G0.85790.48280.51730.48640.71560.81160.8265
SSIM-B0.80310.46720.52130.49620.68630.78390.7873
Table 4. Comparison of the FSIM values among the various demosaicing methods on the Kodak and the McMaster image datasets.
Table 4. Comparison of the FSIM values among the various demosaicing methods on the Kodak and the McMaster image datasets.
Noise LevelDatasetMeasureADMMSEMDNetBDNetXDIPProposedProposed +BM3D
KodakFSIMc0.97220.97920.97430.97300.96660.97750.9764
FSIM-R0.95800.96350.94690.94710.94690.96650.9705
σ R = 9.83 FSIM-G0.97290.97940.97380.97340.96390.97360.9737
σ G = 6.24 FSIM-B0.95740.97280.96500.96070.96060.97040.9708
σ B = 6.84McMasterFSIMc0.98070.97880.97560.97460.97080.98240.9824
FSIM-R0.96500.95650.94780.94810.95400.97160.9756
FSIM-G0.98020.97730.97430.97430.96670.97790.9784
FSIM-B0.96290.96000.96130.95870.96380.97490.9754
KodakFSIMc0.95980.92320.92250.91750.92760.95480.9562
FSIM-R0.93990.89360.86540.87380.89780.95480.9463
σ R = 19.67 FSIM-G0.96030.92660.92170.92100.92700.95140.9540
σ G = 12.48 FSIM-B0.94710.92590.90680.90140.92450.94900.9520
σ B = 13.67McMasterFSIMc0.96950.92900.93040.92680.93780.96250.9656
FSIM-R0.94620.89110.87490.88030.90860.94670.9525
FSIM-G0.96830.93150.92770.92840.93450.95820.9618
FSIM-B0.95090.91970.90850.90560.93200.95540.9591
KodakFSIMc0.95020.88110.88350.87760.90070.93720.9433
FSIM-R0.92550.84790.81480.82720.86870.90420.9305
σ R = 26.22 FSIM-G0.95080.88680.88480.88320.89810.93520.9417
σ G = 16.64 FSIM-B0.93830.88760.86760.86070.89690.93290.9406
σ B = 18.23McMasterFSIMc0.96010.89190.89710.89290.91640.94700.9530
FSIM-R0.93090.84620.82910.83790.88550.92580.9351
FSIM-G0.95880.89820.89460.89640.91320.94370.9500
FSIM-B0.94020.88690.87270.87000.90920.94030.9473
Table 5. Showing the computational time costs of the different joint demosaicing and denoising methods.
Table 5. Showing the computational time costs of the different joint demosaicing and denoising methods.
MethodADMMSEMDNetBDNetXDIPProposed
Time Cost567 s465 s8 s8 s525 s647 s
Software ToolMATLABPythonPythonPythonPytorchPytorch

Share and Cite

MDPI and ACS Style

Park, Y.; Lee, S.; Jeong, B.; Yoon, J. Joint Demosaicing and Denoising Based on a Variational Deep Image Prior Neural Network. Sensors 2020, 20, 2970. https://doi.org/10.3390/s20102970

AMA Style

Park Y, Lee S, Jeong B, Yoon J. Joint Demosaicing and Denoising Based on a Variational Deep Image Prior Neural Network. Sensors. 2020; 20(10):2970. https://doi.org/10.3390/s20102970

Chicago/Turabian Style

Park, Yunjin, Sukho Lee, Byeongseon Jeong, and Jungho Yoon. 2020. "Joint Demosaicing and Denoising Based on a Variational Deep Image Prior Neural Network" Sensors 20, no. 10: 2970. https://doi.org/10.3390/s20102970

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop