Comparison of Deep Learning and Conventional Demosaicing Algorithms for Mastcam Images

Bayer pattern filters have been used in many commercial digital cameras. In National Aeronautics and Space Administration’s (NASA) mast camera (Mastcam) imaging system, onboard the Mars Science Laboratory (MSL) rover Curiosity, a Bayer pattern filter is being used to capture the RGB (red, green, and blue) color of scenes on Mars. The Mastcam has two cameras: left and right. The right camera has three times better resolution than that of the left. It is well known that demosaicing introduces color and zipper artifacts. Here, we present a comparative study of demosaicing results using conventional and deep learning algorithms. Sixteen left and 15 right Mastcam images were used in our experiments. Due to a lack of ground truth images for Mastcam data from Mars, we compared the various algorithms using a blind image quality assessment model. It was observed that no one algorithm can work the best for all images. In particular, a deep learning-based algorithm worked the best for the right Mastcam images and a conventional algorithm achieved the best results for the left Mastcam images. Moreover, subjective evaluation of five demosaiced Mastcam images was also used to compare the various algorithms.


Introduction
After an eight-month long journey, the NASA Mars Science Laboratory (MSL) Curiosity rover landed on Mars in 2012 [1].The rover has several instruments characterizing the Martian surface and environment.The Alpha Particle X-Ray Spectrometer (APXS) [2] and the Laser Induced Breakdown Spectroscopy (LIBS) [3,4] are being used for rock sample analysis.The Mastcam multispectral imagers [5][6][7][8][9][10] can perform surface characterization by acquiring images with resolutions from a few mm/pixel in the foreground, to several m/pixel for distant features.There are two Mastcam multispectral imagers, each capable of imaging in nine different spectral bands, and separated by 24.2 cm for stereo imaging [1].The right imager has a resolution that is three times better than that of the left and it is mainly used for near field image acquisition.The left imager has a field of view that is three times wider than that of the right and it is mainly used for rover navigation.Out of the nine bands for each camera, three are RGB bands, which are generated by using a Bayer pattern filter superimposed on the detector.Similar to many commercial digital cameras, the use of a Bayer pattern saves on the overall cost of the Mastcam.
The Bayer pattern filter was invented in 1976 [11].There have been many debayering/demosaicing algorithms developed over the last few decades (see [12][13][14][15][16] and references therein).For demosaicing the RGB bands in Mastcam, the current data processing strategy uses the Malvar-He-Cutler (MHC) algorithm [17], an algorithm developed in 2004, because of its relative simplicity to implement in the rover camera's control electronics.In [1], a Directional Linear Minimum Mean Square-Error Estimation (DLMMSE) [18] demosaicing algorithm was also evaluated and found to perform better than MHC in some situations.
Since 2012, deep neural networks, also known as deep learning, have gained a lot of attention.Many versions of deep learning algorithms have achieved great results in many applications.In [19], for example, one deep learning-based algorithm was proposed for joint demosaicing and denoising.We call this algorithm DEMOsaic-Net (DEMONET).The method achieved great performance in some tests.We also identified two deep learning based algorithms [20,21] that have open source codes.
The first key contribution of our research was to investigate whether or not there are better and more effective algorithms, developed after 2004, for demosaicing Mastcam images.A recent paper by us [22] initiated this effort and investigated the performance of several pixel-level fusion approaches, including equal weighting, unequal weighting, random weighting, and a fusion scheme known as alpha-trimmed mean filtering (ATMF) [23].Several algorithms [24][25][26][27] with publicly available codes were identified and compared.We applied the various demosaicing algorithms to 31 Mastcam images, which were retrieved from the NASA Planetary Data System (PDS).
The second major contribution of our research was the comparison of conventional and deep learning based demosaicing algorithms for Mastcam images.In addition to those non-deep learning based algorithms [16][17][18][19][20][21][22][23][24][25][26][27], we also included four recent ones [28][29][30][31] that were demonstrated to have good performance in recent studies.Out of our comparative study for Mastcam applications, we made several observations.One observation was that the MHC algorithm still yielded reasonable performance in Mastcam images, but recent algorithms can do better.A second observation was that not all deep learning algorithms performed well.Only the DEMONET achieved better performance than other deep learning algorithms in Mastcam images.This clearly demonstrates the fact that the selection of demosaicing algorithms really depends on the application.One should not blindly pick one based on its performance for some other applications.A third observation is that DEMONET has the best performance only for right Mastcam images based on both subjective and objective evaluations.DEMONET has close performance to a conventional algorithm known as exploitation of color correlation (ECC) [28] for the left Mastcam images.Without a systematic study, it is possible that other demosaicing algorithms might be recommended instead of DEMONET for the right images and ECC for the left images to NASA or other agencies for future space mission imaging developments.
It should be noted that this paper is a natural extension of our earlier paper [22].There are three major differences.First, we have used a blind image quality assessment model to generate objective performance metrics for the demosaiced Mastcam images from various algorithms.This is because we do not have ground truth Mastcam images in our study.This part is completely new and was not done in [22].Second, we have added more demosaiced images in our subjective evaluations, which will help readers to appreciate the powerful results of some of the high performing approaches.Third, in addition to those methods in [22], we have included three deep learning based algorithms and four conventional algorithms [28][29][30][31] in our comparative study.Altogether there are 17 methods.
This paper is organized as follows.In Section 2, we summarize the various conventional and deep learning approaches.Section 3 provides background on Mastcam images and performance metrics, and summarizes the extensive experimental results.Actual Mastcam images were used in our experiments.Finally, concluding remarks and future research directions are provided in Section 4.

Algorithms
The following algorithms were evaluated in our experiments and they are briefly summarized below:

•
Linear Directional Interpolation and Nonlocal Adaptive Thresholding (LDI-NAT): This algorithm is simple but the non-local search is time consuming [16].

•
MHC: It is the Malvar-He-Cutler algorithm in [17].This is the default method for demosaicing Mastcam images used by NASA.The algorithm is very efficient and simple to implement.
• Directional Linear Minimum Mean Square-Error Estimation (DLMMSE): It is the Zhang and Wu algorithm in [18].This method was investigated in Bell et al.'s paper [1].

•
Adaptive Frequency Domain (AFD): It is a frequency domain approach from Dubois [25].
The algorithm can also be used for other mosaicking patterns.

•
ATMF: This method is from [23].At each pixel location, we demosaic pixels from seven methods; the largest and smallest pixels are removed and the mean of the remaining pixels are used.This method fuses the results from AFD, AP, LT, DLMMSE, MHC, PCSD, and LDI-NAT.

•
Demosaicnet (DEMONET): In [19], a feed-forward network architecture was proposed for demosaicing.There are D + 1 convolutional layers and each layer has W outputs and uses K × K size kernels.An initial model was trained using 1.3 million images from Imagenet and 1 million images from MirFlickr.Additionally, some challenging images were searched to further enhance the training model.Details can be found in [19].It should be noted that we have also performed some training using only Mastcam images.However, the customized model was not good as compared to the original one.This is probably due to lack of training data, as we have less than 100 high quality Mastcam images.

•
Fusion using 3 best (F3) [22]: We only used F3 for Mastcam images.The mean of pixels from demosaiced images of LT, MHC, and LDI-NAT were used.

•
Bilinear: We used bilinear interpolation for Mastcam images because it is the simplest algorithm.

•
Sequential Energy Minimization (SEM) [21]: A deep learning approach based on sequential energy minimization was proposed in [21].The performance was reasonable except that the computation takes a long time due to sequential optimization.

•
Deep Residual Network (DRL) [20]: A DRL algorithm is a deep learning based approach that was proposed for demosaicing based on a customized convolutional neural network (CNN) with a depth of 10 and a receptive field of size 21 × 21.

•
Exploitation of Color Correlation (ECC) [28]: The authors of [28] proposed a scheme that exploits the correlation between different color channels much more effectively than some of the existing algorithms.

•
Adaptive Residual Interpolation (ARI) [29]: ARI adaptively combines RI and MLRI at each pixel, and adaptively selects a suitable iteration number for each pixel, instead of using a common iteration number for all of the pixels.

•
Directional Difference Regression (DDR) [30]: DDR obtains the regression models using directional color differences of the training images.Once models are learned, they will be used for demosaicing.
It should be noted that F3 and ATMF are both pixel-level fusion methods.Details can be found in [22].

Performance Metrics
Since there is no ground truth, we used a blind quality assessment using the Natural Image Quality Evaluator (NIQE).A "completely blind" image quality assessment (IQA) model was recently developed [32] that captures the measurable deviations from expected statistical regularities observed on high-quality natural images.This package can be downloaded from Prof. A. C. Bovik's website.Details can be found in [32,33].

Data
Mastcam has two imagers as shown in Figure 1.The left imager has three times lower resolution than that of the right.The left is usually used for long range image acquisition and the right camera is for near field data collection.

Data
Mastcam has two imagers as shown in Figure 1.The left imager has three times lower resolution than that of the right.The left is usually used for long range image acquisition and the right camera is for near field data collection.Mastcam images from NASA's PDS data repository were used in our experiments.From PDS, we retrieved 31 actual Mastcam images, which are in Bayer pattern.There are no ground truth Mastcam images.For visualization purposes, we included some demosaiced images using the MHC algorithm of the left and right Mastcam images, shown in Figures 2 and 3, respectively.Mastcam images from NASA's PDS data repository were used in our experiments.From PDS, we retrieved 31 actual Mastcam images, which are in Bayer pattern.There are no ground truth Mastcam images.For visualization purposes, we included some demosaiced images using the MHC algorithm of the left and right Mastcam images, shown in Figures 2 and 3, respectively.

Data
Mastcam has two imagers as shown in Figure 1.The left imager has three times lower resolution than that of the right.The left is usually used for long range image acquisition and the right camera is for near field data collection.

Left Mastcam Image Demosaicing Results
Due to lack of ground truth Mastcam images, it is not possible to generate peak signal-to-noise ratio (PSNR) and cielab [34] metrics, which have been widely used in demosaicing research.Instead, we applied a blind image quality assessment model, called Natural Image Quality Evaluator (NIQE), to assess the demosaiced images.NIQE was developed by researchers at UT (University of Texas) Austin [32], and we have used NIQE in another project before [33].Here, we customized the NIQE model by training it using high quality right Mastcam images.There are six bands of high quality non-RGB images in each Mastcam image cube.
Tables 1-3 show the NIQE scores of various methods for the R, G, and B bands respectively.Table 4 shows the averaged NIQE scores of the R, G, and B results of different methods.Low NIQE scores mean better performance.Bold numbers in each table indicate the best performing methods.For the left images, we observed the following:

•
The MHC method, which was developed in 2004 and is currently being used by NASA, is mediocre in terms of average scores.Seven algorithms have better results.

•
The non-deep learning based method known as ECC achieved the best performance for left images in red and blue bands.However, MLRI has good performance in the green band.

•
From Table 4, DEMONET performed the best amongst the deep learning algorithms.Its averaged score (5.98) is close to that of ECC (5.51).
Figures 4 and 5 further corroborate the above observations.Actually, ECC and DEMONET have comparable performance in all R, G, and B bands.
Figures 6-8 visually compare the demosaiced outputs from the 17 methods.From Figure 6, one can see that bilinear, MHC, AP, LT, LDI-NAT, F3, and ATMF all have strong color distortions.Strong zipper artifacts can be seen in the results from AFD, AP, DLMMSE, PCSD, LDI-NAT, F3, and ATMF.The results of ECC and MLRI also have slight color distortions.The most perceptually pleasing results can be seen from the results of DEMONET, ARI, DRL, and SEM.From Figure 7, strong color

Left Mastcam Image Demosaicing Results
Due to lack of ground truth Mastcam images, it is not possible to generate peak signal-to-noise ratio (PSNR) and cielab [34] metrics, which have been widely used in demosaicing research.Instead, we applied a blind image quality assessment model, called Natural Image Quality Evaluator (NIQE), to assess the demosaiced images.NIQE was developed by researchers at UT (University of Texas) Austin [32], and we have used NIQE in another project before [33].Here, we customized the NIQE model by training it using high quality right Mastcam images.There are six bands of high quality non-RGB images in each Mastcam image cube.
Tables 1-3 show the NIQE scores of various methods for the R, G, and B bands respectively.Table 4 shows the averaged NIQE scores of the R, G, and B results of different methods.Low NIQE scores mean better performance.Bold numbers in each table indicate the best performing methods.For the left images, we observed the following:

•
The MHC method, which was developed in 2004 and is currently being used by NASA, is mediocre in terms of average scores.Seven algorithms have better results.

•
The non-deep learning based method known as ECC achieved the best performance for left images in red and blue bands.However, MLRI has good performance in the green band.In terms of computational complexity, ARI is the slowest amongst the non-deep learning methods.It took approximately three minutes per image.For the deep learning based methods, SEM is the slowest because, for some big images, we needed to divide the image into quadrants and each quadrant takes a few minutes to process.In any event, computational efficiency is not a concern for this Mastcam project, as the demosaicing is done off-line.Image quality is the most important concern for this project.

Right Mastcam Image Demosaicing Results
Tables 5-7 summarize the NIQE scores of R, G, and B bands of various methods for 15 right images, respectively.Table 8 summarizes the averaged scores of R, G, and B from the individual tables.We made the following observations:

•
In general, the NIQE scores are lower in the right images than those in the left images.This is because the right images have three times higher resolution than those of the left.As a result, neighboring pixels have better correlation, and hence it is easier to demosaic in right images.

•
In right images, DEMONET has the best performance in all images.

•
MHC is again the mediocre algorithm, as there are seven other algorithms that performed better.

Right Mastcam Image Demosaicing Results
Tables 5-7 summarize the NIQE scores of R, G, and B bands of various methods for 15 right images, respectively.Table 8 summarizes the averaged scores of R, G, and B from the individual tables.We made the following observations:

•
In general, the NIQE scores are lower in the right images than those in the left images.This is because the right images have three times higher resolution than those of the left.As a result, neighboring pixels have better correlation, and hence it is easier to demosaic in right images.

•
In right images, DEMONET has the best performance in all images.

•
MHC is again the mediocre algorithm, as there are seven other algorithms that performed better.

•
There are several non-deep learning based algorithms (ECC, ARI, MLRI) that performed better than two of the deep learning based methods (SEM and DRL).

•
Among the non-deep learning based algorithms, ECC is the best performing one.
Figure 9 plots the averaged NIQE scores of different methods for all the R, G, and B bands from all images.Figure 10 shows the averaged NIQE scores of all bands from all images.They further demonstrated that DEMONET is the best performing method for the right images.
In addition to the above objective evaluations, all the demosaiced images using various algorithms were subjectively evaluated.Here, we included two sets of right demosaiced images below for subjective evaluation.Figures 11 and 12 show the two right Mastcam images.From Figure 11, it is hard to see any color distortions in all methods.However, one can observe some noticeable zipper artifacts in the results of AFD, AP, LT, DLMMSE, PCSD, LDI-NAT, ATMF, F3, and bilinear.The perceptually good algorithms include DEMONET, ARI, DDR, DRL, ECC, SEM, and MLRI.It can be seen that although the MHC algorithm was developed in 2004, it still performed reasonably well in this image.From Figure 12, we observe some large performance variations from different algorithms due to the presence of sharp edges in the scene.Strong color distortion can be seen from the results of AP, LT, LDI-NAT, and bilinear algorithms.Strong zipper artifacts are observed from the results of AFD, AP, LT, DLMMSE, PCSD, LDI-NAT, ATMF, F3, and bilinear algorithms.Perceptually good algorithms include MHC, DEMONET, ARI, DDR, DRL, ECC, SEM, and MLRI.In short, the DEMONET method yielded the best subjective and objective performance in all right Mastcam images.Figure 13 shows a summary for all methods and left and right results are put together in one chart.It can be seen that right images have better performance than left due to the right camera's spatial resolution being three times higher than the left.

Figure 1 .
Figure 1.Mars rover Curiosity and its onboard cameras.Mastcam imagers act as eyes of the rover for rock sample selection and rover guidance.

Figure 1 .
Figure 1.Mars rover Curiosity and its onboard cameras.Mastcam imagers act as eyes of the rover for rock sample selection and rover guidance.

Figure 1 .
Figure 1.Mars rover Curiosity and its onboard cameras.Mastcam imagers act as eyes of the rover for rock sample selection and rover guidance.

Figure 4 .
Figure 4. Averaged NIQE scores of different bands for left images using all methods.

Figure 5 .
Figure 5. Averaged NIQE scores of R, G, and B bands for left images using all methods.

Figure 6 .
Figure 6.Subjective comparison of demosaiced images of different algorithms for left Mastcam image 1.

Figure 4 .
Figure 4. Averaged NIQE scores of different bands for left images using all methods.

Figure 4 .
Figure 4. Averaged NIQE scores of different bands for left images using all methods.

Figure 5 .
Figure 5. Averaged NIQE scores of R, G, and B bands for left images using all methods.

Figure 6 .
Figure 6.Subjective comparison of demosaiced images of different algorithms for left Mastcam image 1.

Figure 5 .
Figure 5. Averaged NIQE scores of R, G, and B bands for left images using all methods.

Figures 6 -
Figures 6-8 visually compare the demosaiced outputs from the 17 methods.From Figure 6, one can see that bilinear, MHC, AP, LT, LDI-NAT, F3, and ATMF all have strong color distortions.Strong zipper artifacts can be seen in the results from AFD, AP, DLMMSE, PCSD, LDI-NAT, F3, and ATMF.The results of ECC and MLRI also have slight color distortions.The most perceptually pleasing results can be seen from the results of DEMONET, ARI, DRL, and SEM.From Figure 7, strong color distortions can be seen from the results of AD, LT, MHC, LDI-NAT, and bilinear.Zipper artifacts are strong in the results of AFD, ATMF, AP, DLMMSE, PCSD, and LDI-NAT.Minor color distortions can be seen in MLRI, ECC, and DDR.Good results can be found in DEMONET, ARI, DRL, and SEM.In Figure8, we can see that AFD, AP, bilinear, DLMMSE, F3, ATMF, LDI-NAT, LT, and PCSD all have strong color and zipper artifacts.Perceptually good results can be seen in the results of DEMONET, ARI, DDR, DRL, ECC, SEM, and MLRI.In terms of computational complexity, ARI is the slowest amongst the non-deep learning methods.It took approximately three minutes per image.For the deep learning based methods, SEM is the slowest because, for some big images, we needed to divide the image into quadrants and each quadrant takes a few minutes to process.In any event, computational efficiency is not a concern for this Mastcam project, as the demosaicing is done off-line.Image quality is the most important concern for this project.

Figure 5 .
Figure 5. Averaged NIQE scores of R, G, and B bands for left images using all methods.

Figure 6 .
Figure 6.Subjective comparison of demosaiced images of different algorithms for left Mastcam image 1.

Figure 6 . 19 Figure 7 .
Figure 6.Subjective comparison of demosaiced images of different algorithms for left Mastcam image 1. Electronics 2019, 2, x FOR PEER REVIEW 10 of 19

Figure 7 .
Figure 7. Subjective comparison of demosaiced images of different algorithms for left Mastcam image 5.

Figure 8 .
Figure 8. Subjective comparison of demosaiced images of different algorithms for left Mastcam image 6.

Figure 8 .
Figure 8. Subjective comparison of demosaiced images of different algorithms for left Mastcam image 6.

19 Figure 9 .
Figure 9. Averaged NIQE scores of different bands using all methods for right images.

Figure 10 .
Figure 10.Averaged NIQE scores of different bands using all methods for right images.

Figure 9 . 19 Figure 9 .
Figure 9. Averaged NIQE scores of different bands using all methods for right images.

Figure 10 .
Figure 10.Averaged NIQE scores of different bands using all methods for right images.

Figure 10 .
Figure 10.Averaged NIQE scores of different bands using all methods for right images.

Figure 11 .
Figure 11.Subjective comparison of demosaiced images of different algorithms for right Mastcam image 1.

Figure 11 .
Figure 11.Subjective comparison of demosaiced images of different algorithms for right Mastcam image 1.

Figure 13 .
Figure 13.Averaged NIQE scores (RGB scores are averaged) of different methods for left and right images.

Figure 13 .
Figure 13.Averaged NIQE scores (RGB scores are averaged) of different methods for left and right images.

Figure 13 .
Figure 13.Averaged NIQE scores (RGB scores are averaged) of different methods for left and right images.

•
From Table4, DEMONET performed the best amongst the deep learning algorithms.Its averaged score (5.98) is close to that of ECC (5.51).

Table 1 .
Summary of Natural Image Quality Evaluator (NIQE) scores (lower means better).Sixteen left images: Red band.

Table 2 .
Summary of NIQE scores (lower means better).Sixteen left images: Green band.

Table 3 .
Summary of NIQE scores (lower means better).Sixteen left images: Blue band.

Table 4 .
Summary of NIQE scores (lower means better).Sixteen left images: Averaged over R, G, and B bands.