Optimized Multi-Spectral Filter Array Based Imaging of Natural Scenes

Multi-spectral imaging using a camera with more than three channels is an efficient method to acquire and reconstruct spectral data and is used extensively in tasks like object recognition, relighted rendering, and color constancy. Recently developed methods are used to only guide content-dependent filter selection where the set of spectral reflectances to be recovered are known a priori. We present the first content-independent spectral imaging pipeline that allows optimal selection of multiple channels. We also present algorithms for optimal placement of the channels in the color filter array yielding an efficient demosaicing order resulting in accurate spectral recovery of natural reflectance functions. These reflectance functions have the property that their power spectrum statistically exhibits a power-law behavior. Using this property, we propose power-law based error descriptors that are minimized to optimize the imaging pipeline. We extensively verify our models and optimizations using large sets of commercially available wide-band filters to demonstrate the greater accuracy and efficiency of our multi-spectral imaging pipeline over existing methods.


Introduction
Multi-channel spectral imaging using a single-camera is an efficient way to acquire spectral data. Compared with multi-camera systems that use multiple cameras with modified camera sensitivity, it is more compact and practical. Typically, the multiple different channels of the spectrum are captured (a) over multiple shots using a different color filter in front of the camera sensor for each shot; or (b) in a single shot using a mosaic of the multiple filters on the sensor forming a multiple spectral filter array (MSFA).
Traditionally, narrow band filters are used to provide a large number of channels, thereby providing higher spectral resolution assuring an accurate spectral recovery. However, this severely compromises the acquisition rate in multi-shot capture and the spatial resolution in single shot capture. Since spatial resolution is very important for most scenarios, commercially available multi-spectral cameras are multi-shot. Narrow band filters also significantly compromise light efficiency and therefore almost all multi-spectral cameras use longer exposures, thereby limiting its capture to only static scenes. Recently, use of wide-band filters has been explored to alleviate these problems by reducing the number of channels used [1]. However, the channels are usually chosen in an ad hoc manner, thereby not assuring an accurate spectral recovery. Recently, a work showed that, with channel selection, the spectral reconstruction accuracy could be improved by over 33% [2].
Previous works have shown that both man-made and natural objects or phenomena (like illumination) have smooth spectral power distribution. Statistically, their power spectrum follows a strong power law [3,4]. We exploit this statistical property to carefully select a small number of optimal wide band channels (usually 5-6) for multi-spectral imaging that can ensure an accurate spectral reconstruction when used in an MSFA for a single shot capture. The main contributions of this work are as follows.
• First, we design and develop a novel power-law prior based channel optimization method that models the various errors associated with spectral reconstruction-namely error due to estimation (reconstruction error), noise (imaging error) and demosaicing (demosaicing error). These errors depend only on the camera parameters (e.g., spectral sensitivities of channels, the MSFA pattern, the demosaicing order, and variance of the sensor noise) and not on the content. To the best of our knowledge, this is the first model for defining all the different errors in a content-independent multi-spectral imaging pipeline. • Second, we construct an objective function that quantifies the total error using a combination of the three above-mentioned errors. Next, we use a discrete particle swarm optimization method to optimize the imaging pipeline by (1) selecting a few channels from a large set of candidate channels; (2) constructing a conducive mosaic pattern with the chosen channels on the MSFA; and (3) selecting a channel ordering during demosaicing that minimizes the objective function and hence the total error in spectral reconstruction.

Related Works
Multispectral Imaging and Reconstruction: a large number of the existing imaging techniques today are either bulky, or expensive and require professional calibration [5,6]. Our focus in this paper are the alternate, more compact spectral imaging techniques that are popularly used for commodity applications. There are two existing spectral imaging techniques that fit this bill-(a) compressive spectral imaging and (b) multi-spectral filter array (MSFA) based spectral imaging.
Compressive computational imaging systems randomly encode captured spectra [7], and then reconstruct multi-spectral images using compressive sensing reconstruction techniques [8][9][10]. Taking full advantages of the sparseness of spectral image data, these reconstruction techniques can recover multi-spectral images using fewer observations than conventionally needed. Furthermore, due to the high light throughput (≈0.5 [9]) of these systems, the quality of reconstructed images is robust to imaging noise. However, the sparse recovery process is time-consuming. For example, it can take minutes or even hours to reconstruct a single 512 × 512 × 31 multi-spectral frame. Therefore, it is not suitable for real-time applications, such as rendering relighted scenes, or object detection.
Contrary to compressive computational imaging, MSFA offers a very time efficient option for spectral reconstruction. Recently, it has been shown that, when using wide-band filters, MSFA based methods can be more accurate in the common case where the "photon to read noise ratios" (PRR) is not very low [11]. Therefore, various multi-channel spectral imaging system with MSFA have been designed recently. For example, Yasuma et al. [12] proposed a sub-micron pixel image sensor design with seven colors and two exposures; Sajadi et al. [13] combined a red-green-blue Bayer color filter array (CFA) with a cyan-magenta-yellow CFA in a layered CFA design to capture images with high color fidelity; Monno et al. [14] presented a high-performance multi-spectral demosaicing algorithm and utilized a single sensor with 5-channel MSFA to capture multi-spectral images. However, none of these single-shot methods focus on optimizing image pipeline parameters (like the channels selected, spatial arrangement of channels on the MSFA, demosaicing order) in a unified manner and hence cannot provide the desired spatial and spectral precision with accuracy of reconstruction.
Channel Selection: references [15,16] optimize spectral sensitivity of channels with a regular three-channel RGB Bayer color filter array (CFA) , while references [17,18] optimize channels to minimize the error in spectral recovery, but only for spectral data that have strong priors like known distribution of the captured spectral functions (e.g., daylight [19]). Such domain-specific priors yield content-dependent methods. More importantly, theoretically conducive spectral sensitivity of the channels are assumed (e.g., radial basis function [20,21], Fourier basis function [22]). This yields poor results in practice due to a large deviation of the sensitivities of commercially available filters from such well-behaved functions [23] ( Figure 1). In contrast, we select filters from a set of commercially available wide-band filters to create a practically feasible spectral imaging pipeline. A handful of earlier works take this route. Yasuma et al. provided a cost function to describe spectral recovery error that is optimized to select filters for the generalized assorted pixel (GAP) camera [12]. However, the design and spatial arrangement of the GAP camera lacks freedom, while impacting the quality of spectral recovery significantly. Chi et al. [1] heuristically minimize the condition number of spectral sensitivity of channels to make the spectral recovery specifically robust to noise. Other work [24] optimizing spectral reflectance targets can also be used to select channels. All these methods cannot guarantee spectral error minimization and therefore optimal spectral reconstruction. Recently, Arad et al. [2] used different three-channel combinations to reconstruct multispectral images and choose the best combination. However, it can not be extended to multi-channel selection directly since it is too time-consuming and may depend on specific spectral reconstruction method.
Demosaicing: demosaicing methods have been devised independent of the other elements of a spectral imaging pipeline. Interestingly, it provides a diverse view of a conducive filter array and demosaicing methods. While some prior works [25] emphasize that the spectral sensitivities of neighboring pixels should be highly correlated to minimize inter-channel demosaicing error, other works (e.g., [26]) demonstrate the benefits of completely independent demosaicing fostered by low correlation between narrow band channels. Many works [21,27] treat the channel with highest sampling rate and largest throughput as the dominant channel and complete missing values of other channels that are considered dependent on the guide channel. In addition, a recent work uses a channel-dependent demosaicing strategy for a non-channel-dominant narrow-band MSFA [28].
Comparison with Proposed Work: the aforementioned literature survey reveals that mutually correlated problems of channel selection, filter array design, demosaicing methods and imaging noise have always been addressed in exclusion and therefore failed to yield solutions that will be optimal in terms of accuracy and efficiency for the final output of the entire spectral imaging pipeline. Therefore, we attempt to address all these correlated issues in a unified manner and optimize them together.

Modeling Error in Spectral Recovery
Let the spectral power distribution (SPD) function of the spectrum at spatial coordinate (x, y) be s(x, y, λ). Consider a multi-spectral camera with N channels (N > 3). Let channel i (1 ≤ i ≤ N) have the spectral sensitivity function m i (λ) that can be obtained from specification sheets or measured using a broadband light source using some low-cost methods [29]. Let the response of channel i of the camera to s(x, y, λ) be x i . In a practical system, x i is comprised of ground truth,x i , and errors, i . If the channel i at (x, y) is interpolated during demosaicing, then i = ζ i + δ i where ζ i denotes error due to imaging noise and δ i denotes demosaicing error. For pixels where channel i is captured directly, δ i = 0. Therefore, We can write the above equation using matrices considering all channels as where [λ 1 , λ 2 ] is the range of wavelength of visible light (e.g., 380-780 nm), M is a known matrix of dimension N × K where each row of M denotes one of the channels at a spectral resolution of K, s is the K × 1 SPD function to be recovered and X denotes the N × 1 known response vector and , ζ and δ are all N × 1 error vectors. Let us assume that the estimated spectrumŝ is given by multiplying X with reconstruction matrix W. Using a pseudo-inverse matrix W + = M T (MM T ) −1 or a Wiener pseudo inversion W + = Corr s M T (MCorr s M T ) −1 [30] with an approximated autocorrelation matrix Corr s to represent the reconstruction matrix, the recovered spectrumŝ, is given bŷ We use the mean squared error (MSE) between the original and the recovered spectrum to describe the average error in the estimation as Now, replacing Equation (3) in Equation (4), we get where Corr s , Corr are the autocorrelation matrices E[ss T ], E[ T ], respectively. Interestingly, in the above equation, the first term describes the error due to spectral recovery while the latter accounts for the error due to imaging noise and demosaicing. Next, we analyze the statistical properties of the natural multi-spectral images in order to model and approximate the correlation matrix Corr s and Corr . Finally, we also describe how MSFA patterns and demosaicing interpolation affect demosaicing error and noise, respectively.

Spectral Characteristics of Natural Images
In order to study this, we use three hyperspectral datasets-CAVE dataset, Harvard dataset, and one that we ourselves capture using a Surface Optics SOC-730 (Surface Optics Corporation, San Diego, CA, USA) camera with a 1024 × 1024 spatial resolution and a 2 nm (400-1000 nm) spectral resolution-this dataset contains 60 images of a sample, which is shown in Figure 2. Prior work has shown that multi-spectral images can be decomposed into Cartesian products of spectral and spatial components [31]. Therefore, we discuss them separately. Spectral components-prior works observe the following: (a) illumination and reflectance spectra of natural or man-made objects and phenomena are relatively smooth [32] and therefore frequency-limit functions can be used to approximate them [33]; (b) in addition to being smooth, these illumination and reflectance spectra also show remarkably similar energy distribution in the frequency domain ( Figure 3). In fact, the surfaces formed by these distribution functions have a shape close to an exponential function with an exponent of (−2). Therefore, they follow Steven's power law of human perception. Steven's Law is the fundamental empirical law of human perception that most human responses to input environmental stimuli is related by a power function [34].
We illustrate the aforementioned properties in Figure 3. To eliminate the effect of varying illumination and brightness of the images, we first compute the difference spectrum ∆s = s − µ, where µ is the mean value in spectral direction of multispectral images. For each ∆s, we compute where ∆s F (k) is the Fourier transform of difference spectrum ∆s(λ), k is the frequency of spectrum, 2 − β is the frequency exponent, and β clusters around 0 for natural spectra. Spatial components-it is well known that the RGB real-world images, including both natural landscapes and man-made environments, also follow the Power Law [35,36] and tend to be scale invariant. This property has already been widely used in many computer graphics applications [37], e.g., exposing forgeries, imaging denoising. Interestingly, we also observe this property across different bands of real-world multispectral images ( Figure 3). As in spectral components, we can show for spatial components as well that the difference band images, ∆I = I − µ, where µ is the mean of the spectral band across space, exhibits the same property as: where ∆I F is the 2D Fourier transform of difference band-images ∆I, ( f , θ) is the polar coordinates of 2D frequency of band-images, A(θ) and 2 − α(θ) are the scaling function and the frequency exponent function respectively for each orientation θ. Here, α(θ) clusters around 0 for natural images. Furthermore, for natural objects, the scaling function A(θ) clusters around 1; while, for man-made objects, the value of A(θ) for oblique orientation, θ o , is much smaller than its value for horizontal and vertical orientations, θ h and θ v , respectively. This is due to the fact that man-made objects usually contain horizontal and vertical edges [35].

Modeling Recovery Error
The statistical fact that spatial and spectral components of multi-spectral images follow a power law is not dependent on any particular data set. Therefore, this provides a prior that we use to approximate recovery error Tr HCorr s H T in Equation (6). In order to achieve this, we approximate Corr s by exploiting the content independent power law and the Wiener-Khinchin theorem. Here, every spectral wavelength sample point s i is treated the same. Mathematically, we assume s i = ∆s i + µ (1 ≤ i ≤ K) have the same distribution with the mean value µ and the standard deviation σ; therefore, . We approximate Corr s (i, j) in matrix Corr s using the autocorrelation value Corr s (i, j) = µ 2 + η 1 C s (∆λ), where ∆λ is the distance between spectral band i and j, and η 1 is a scale factor. According to the Wiener-Khinchin theorem, the autocorrelation of a signal could be represented by using the inverse Fourier transform of power spectral function. Thus, a general form of autocorrelation value C s (∆λ) = ∆s(λ)∆s(λ ± ∆λ) dλ can be described as: Finally, our approximation of Corr s can be expressed as Therefore, the final recovery error is estimated by Tr{HCorr s H T }. Compared with the methods constructing the autocorrelation matrix derived from some datasets, we use this error model because it only depends on the average intensity µ and the scale factor η 1 . The effectiveness of the model will be verified in Section 5.1.

Modeling Demosaicing Error and Imaging Noise
In addition to spectral recovery error, the demosaicing error and imaging noise considerably affect the quality of the reconstruction. However, unlike the spectral recovery error, these errors are heavily dependent on (1) the design of MSFA (multi-spectral filter array), which involves the distribution and placement of the different channels in the filter array; and (2) the demosaicing technique that decides the order of the demosaicing process based on the dependency between channels or lack thereof.
Design of MSFA: prior works [39,40] have shown that a good MSFA pattern should have the following properties: (i) it should be spatially uniform to ensure robustness in the face of image sensor imperfections; (ii) it should be periodic to ensure efficiency of image reconstruction; and (iii) it should be neighbor consistent (each channel has the same neighbor channels) to ensure immunity to optical/electrical cross talk between neighboring pixels. Reference [40] presents a binary tree based MSFA design method that assures all the above properties and is elegant in design and implementation. Therefore, we adapt this design and customize it to suit our needs.
In the binary tree based MSFA design [40], the generation of the MSFA is an iterative process of binary splitting as illustrated in Figure 4a. A parent channel evenly divides into two children channels in each splitting, increasing the number of channels by one while scaling down the sampling rate of the children created by half. Reference [40] shows that, since the MSFA patterns satisfy the three properties above, the sibling channels are exchangeable. We can use binary trees to represent such MSFA patterns (Figure 4b) where a leaf node at level l has a sampling rate of 2 −l . Figure 4c,d shows that the patterns generated may not be unique. However, the binary tree for a particular sampling rate combination is unique.   Demosaicing strategy: channel-independent demosaicing completes an undemosaiced channel at a subsampled resolution without dependence of other channels. On the contrary, channel-dependent demosaicing completes an undemosaiced channel at a subsampled resolution using the content from one of more different channels. The channel dependent methods are categorized as fully dependent or partially dependent based on if they use all the channels or a subset thereof for demosaicing, respectively.
Our observation reveals that neither "one-to-many channel-dependent demosaicing" [21] nor "all channel-independent demosaicing" [26] strategy are optimal (see Figure 5). Initially, we start with full resolution mosaic of the most important channel (e.g., green), we use dashed arrows from channel a to channel b to denote the demosaicing of the subsampled resolution channel b with the guidance of the full resolution channel a: self loops indicate channel-independent demosaicing of channel b without guidance while other arrows indicate channel-dependent demosaicing ( Figure 5). It is important to note that channel selection, channel spatial arrangement, demosaicing strategy and imaging noise levels are mutually correlated. Therefore, we devise an adaptive demosaicing based on channels and imaging noise levels. In the following sections, we will model and quantify channel-independent demosaicing and channel-dependent demosaicing errors and use it for the design of the MSFA and the demosaicing method.  Figure 5. Reconstruction spectral errors of a multispectral image, and the reconstruction error statistics (mean, maxinum, standard deviation) of 18 images caused by different demosaicing strategy, the multispectral images are from CAVE dataset. Dashed arrows from channel a to channel b to denote the demosaicing of the subsampled resolution channel b with the guidance of the full resolution channel a (from left to right: "one-to-many channel-dependent demosaicing"; partial channel-dependent demosaicing; "all channel-independent demosaicing").

Channel-Independent Demosaicing Error
First, we consider channel-independent demosaicing for a sensor with resolution R. Let the set of pixels' measuring channel i (at level l in the binary tree) directly on the sensor be I i . Let us consider a pixel p and x i (p) denote the response of channel i (1 ≤ i ≤ m) at the pixel p. If pixel p ∈ I i , the response x i (p) is the addition of noise-free ground truth responsex i (p) and imaging noise ζ i (p). If p ∈ I i , i.e., x i (p) is interpolated from other directly measured pixels during demosaicing, then x i (p) is the sum ofx i (p), ζ i (p) and demosaicing error δ i (p). Therefore, the response x i at pixel p can be written as: where B i (p) is a binary flag that 1 for p ∈ I i and 0, otherwise, and w l i (p, p ) are corresponding interpolation weights, determined by both the level l i and the Gaussian filter used. Combining Equations (2) and (11), we find the demosaicing error and imaging noise of demosaiced pixel p as: The noise of directly observed pixels come from measurement errors and quantization errors resulting from analog to digital conversion and therefore spatially independent [41]. We useζ 2 i to denote the average directly observed noise variance in I i . Noise variance of channel i, E[ζ 2 i ] is a constant that can be estimated using previous method by Shimano [42] as: where ϕ l i denotes ∑ p ∑ p ∈P w 2 l i (p, p ) for abbreviation. Demosaicing error of channel i at pixel p is given by the square of difference between ground truth response x i (p) and the interpolated value obtained from the ground truth responses of the neighboring pixels. Let us now assume that, for each pixel p, the square of ground truth response has the same distribution. Thus, for two pixels, p and q, p = q, . Therefore, demosaicing error variance E[δ 2 i ] of channel i is given by the average of the variance mean(δ i (p) 2 ) of all the pixels in the image and is derived from Equation (14) as: where α l and β l (p, q) are coefficients of E[x 2 i (p)] and E[x i (p)x i (q)], respectively, in the expansion of Equation (14). We model where M i is the known spectral sensitivity of i th channel. We assume a spatially independent autocorrelation matrix E[Ĩ i (p)Ĩ i (p) T ] that can be simplified to E[Ĩ iĨ T i ]. Similar to spectral variable s i , we assume the distribution ofĨ i has the mean value µ and the the standard deviation σ, whereĨ i = µ + ∆Ĩ i . We use C I (∆u, ∆v) = ∆Ĩ(u, v) ∆Ĩ(u ± ∆u, v ± ∆v) du dv to represent the ratio between correlation matrix where (∆u, ∆v) is given by the absolute value of vector difference between locations of pixel p and q, and η 2 is a scale factor. Similar to the approximation of spectral autocorrelation matrix, C I (∆u, ∆v) can also be approximated using Wiener-Khinchin theorem as: Since demosaicing error is independent from imaging noise, the variance E[ 2 j ] in correlation matrix Corr e is the sum of variance of demosaicing error and imaging noise:

Channel-Dependent Demosaicing Error
In channel dependent demosaicing, we denote the guidance channel and the demosaiced channel by g and r from an upper and lower level l g and l r respectively. Prior works show that color-difference invariance in local regions is a reasonable assumption for chrominance (red/blue) channel demosaicing [43] yielding low computational complexity [44]. This color-difference invariance results in a linear demosaicing operation that is convenient for representation and computation. Therefore, we construct a demosaicing error model using the same color-difference invariance and express the response x r (p) of demosaiced channel r at position p as: From the above equation, demosaicing error variance can be transformed into a form like Equation (14) as: wherex r\g denotesx r −x g . Here, the demosaicing error just contains the interpolation error caused in this demosaicing. The imaging noise from demosaiced channel and accumulative errors from guide channel are induced into the noise variance as: where E[ 2 g ] denote the error variance of guide channel g. Note that, in Equation (19), the noise variance of channel r consists of two components: (1) the imaging noise variance ζ 2 r has the same form of channel-independent demosacing (as in Equation (13)); (2) the error variance E[ 2 g ] of guidance channel with the coefficient decided by the level of the channel r and the filter used in the demosaicing algorithm. Similar to the case of channel-independent demosaicing, the error variance of demosaiced channel r can be calculated by . The relationship between channels in demosaicing can be represented using a Demosaicing Forest as illustrated in Figure 6. In this forest, the channel demosaiced without guidance of other channels is the root of a tree in the forest, and a pair of parent and child nodes in a tree represents a pair of guidance and demosaiced channels in channel-dependent demosaicing. Hence, for each channel i, there is a unique demosaicing chain containing all reachable nodes from the root to channel i, denoted by DC(i). Intuitively, if two channels i and j come from different trees, i.e., DC(i) ∩ DC(j) = ∅, then the two channels are independent; otherwise, the two channels would have a lowest common chain element k, where the level of k, l k = max(level(DC(i) ∩ DC(j))). The errors of these two channels are then partially correlated due to sharing of the same component from the error of channel k. Thus, for the off-diagonal elements E[ i j ] (the correlation between the errors of channel i and j) in correlation Corr e , we have: , otherwise, Therefore, the objective function MSE is built with a few parameters: the Gaussian filter we used in demosaicing, scale factors η 1 µ 2 and η 2 µ 2 , and imaging noiseζ 2 i of each channel.

Imaging Optimization Method
In this section, to optimize reconstruction accuracy, we will seek possible combinations of channels, MSFA patterns, and demosaicing patterns that minimize the objective function (Equation (6)) with specific parameters. We propose an optimization method that selects n channels from m candidate channels (n m) and determines the MSFA patterns and demosaicing orders to facilitate accurate recovery of spectral reflectance of most natural and man-made objects.
Considering manufacturing complexity, we assume the number of channels, n, on MSFA to be relatively small (n < 7). Using the binary-tree-based constraint, we know that the number of possible patterns P with n channels are also small. For example, for n < 7, num (P ) = 2, 3, 5 when n = 4, 5, 6. Therefore, we can enumerate every possible pattern for a specific number of channels, and calculate the optimal channel selection, the mosaic pattern, and demosaicing order for each pattern.
However, searching for an ordered subset of n channels from m candidates that would minimize the objective function is NP hard. Therefore, an exhaustive search is time-consuming. For example, with n = 6 and m = 100, we will need to evaluate P(100, 6) ≈ 10 12 permutations. Therefore, we apply a discrete particle swarm optimization (DPSO) method, detailed in Algorithm 1 to search for the optimal imaging parameters.
Particle swarm optimization [45] is a global search method that operates on a population of particles where each particle represents a candidate solution. The method moves these particles around in the search-space and updates the position and velocity of each particle iteratively. for i = 1 ... P do 8: 11: end for 13: (1 ≤ i ≤ P) 14: end for 15: return n ordered channels X gb , demosaicing order O gb ; Let the number of particles be P and the number of iteration be G. Note that ith particle X i denotes ith candidate solution where each solution consists of n ordered channel selected from m candidate channels, and therefore each particle X i is a m × 1 integer vector. In vector X i , the element with positive value k indicates the channel is the k th selected channel in pattern P, while those with 0 value indicate the channels that are not selected. The function f indbest() finds the global optimal tuple of ordered channels X gb and the optimal demosaicing order O gb for the specific pattern P. The behavior of the algorithm is adjusted using four parameters: c 1 , c 2 , w, and p. For each iteration j, the algorithm maintains the global best solution X gb and the local best solution X pb i of each particle. The objective function f (M X i , P) calculates the MSE of the imaging with selected channels X i and specific pattern P for every possible demosaicing orders and then return the minimum MSE.
Each iteration of particle i uses standard procedures in a basic particle swarm optimization but with modified operations. We first calculate the velocity from X i to X pb i and the velocity from X i to X gb using the mutation operation . The velocity from vector A to B is defined as a differential matrix of swapping pairs from A to B. Then, the algorithm updates the velocity V i using weighted sum of previous velocity V i−1 , X pb i X i , and X gb X i ; here, the weighted values indicate the possibility to swapping pairs. Then, the algorithm uses the function reducemat() to retain the principal component of velocity V i by clipping the velocity with a threshold probability p; finally, it updates the solution by adding clipped velocityV i to the current solution X i . The adding operation X i ⊕V i is defined as swapping the status(selected with index or unselected) of the xth channel and the yth channel in set X i , where (x, y) is a swapping pair in matrixV i [46].

Evaluation and Comparison
In this section, we evaluate our error and noise models in simulation using popular multi-spectral image datasets. Next, we compare our optimization method with GAP camera design and Chi's filter selection method. Finally, we explore the optimal number of channels and how imaging noise level affects MSFA design and demosaicing strategy. The candidate set of channels used in simulation is constructed with spectral transmission data of 30 off-the-shelf coating color filters from http://www. rosco.com/filters (see Figure 7).

Error Models
We evaluate the estimated spectral recovery from our method and compare it with Chi's [1] and Shen's method [18]. We randomly choose 100 different channel combinations from the candidate set, each channel combination consisting of six different channels. Next, we calculate average estimated spectral recovery error only (ignoring the error due to noise and demosaicing) for each channel combination using different spectral dataset in [47], which contains natural spectral reflectances in abundance. Figure 8 shows the relationship between objective function values for different channel combinations and the corresponding spectral recovery error on two different datasets for Chi's, Shen's and our method. Note Chi's method is content-independent, but Shen's method uses a prior based on a large generic spectral dataset [32]. Unlike other methods, our method shows a strong linear relationship between evaluated objective function and the estimated spectral recovery error, confirming the higher accuracy of our objective function as a predictor for spectral recovery error. To evaluate our overall models, we consider three different random combination of channels, MSFA patterns, and demosaicing orders as shown in Figure 9. We adopt the bilinear interpolation method and the binary tree-based generic(BTG) demosaicing method [26] as the channel-independent demosaicing, and adopt state-of-the-art guided filter (GF) method [48] and residual interpolation (RI) method [49] as the channel-dependent demosaicing in reconstruction. In error estimation, we used scale factor η 1 = 0.025 and η 2 = 0.15, and random Gaussian noise with SNR ≈ 50 db to the response of camera. The values of the scale factors are directly obtained from the statistics of the multispectral images in the used datasets. The comparison between estimated errors and actual reconstruction errors using different demosaicing methods is shown in Figure 10. The actual errors are acquired by calculating the difference between ground truth and reconstructed results. It is worth noting that impact of different combination is much more than that of different demosaicing methods. Therefore, the combination is the primary factor while the methods are secondary. Our estimated errors are close to the average actual errors and illustrate the suitability of our model as a good descriptor for overall reconstruction error.  Figure 10. Comparison between estimated errors and actual reconstruction errors using different demosaicing methods (BTG + RI, BTG + GF, Bilinear + RI, Bilinear + GF) with the channels and patterns shown in Figure 9. The estimated errors are scaled for comparison.

Comparison with Previous Methods
To verify the effectiveness of our optimization method, we select channels from candidate channel set using our method, decide channel arrangement on MSFA pattern and demosaicing order using three methods-GAP camera [12], Chi and Monno's method [1,14], and our method-as shown in Figure 11, and compare spectral reconstruction accuracy of our method with the other two methods on three different datasets-'CAVE' spectral dataset [38], Harvard's spectral dataset [31], and our dataset. Chi and Monno's method is a combination of Chi's channel selection method and Monno's MSFA design and demosaicing order. We used the demosaicing method in [26] and added random Gaussian noise to responses of camera with SNR ≈ 50 db. To quantify the reconstruction accuracy of multispectral images, especially on edges, we use relative error |∆s(λ)| defined as: Table 1 shows reconstruction error of the three optimized channel combinations in Figure 11 along with the error statistics (mean, maxinum, standard deviation). The optimized channels of our method show superior results to GAP camera's and Chi and Monno's results irrespective of the dataset used. Table 1. Spectral reconstruction error of three optimized channels (see Figure 11) of GAP, Chi and Monno's method, and our method.  Figure 12 shows comparison of the three methods on two multi-spectral images. Since our method takes into account the demosaicing error, our estimated spectral error near sharp edges in images is significantly smaller than others. sRGB Image strong edges strong edges Figure 12. Reconstruction spectral error of multi-spectral images in our database (the 1st row), CAVE spectral database (the 2nd row), and Harvard's database (the 3th row) using a GAP camera, Chi and Monno's method and our mosaic camera (see Figure 11). Please zoom in and see sharp edges in gray images.

Discussion
Optimal Number of Channels. In order to explore the best number of mosaiced channels on MSFAs, in Figure 13, we plot the estimated error using our error model and optimization methods against the different number of channels. Note that the recovery error decreases with an increase in number of channels, while the errors caused by demosaicing and imaging noise increase with the increasing number of channels. This is expected since more channels would result in more accurate recovery while sacrificing spatial resolution for each channel. Therefore, the sweet spot is where the sum of these two curves has a minima. In Figure 13, we see that, using our candidate channels, this is around 5-6 channels.  Figure 13. Our estimated error (spectral recovery error, demosaicing error and imaging noise) on "CAVE" spectral dataset [38] with a different number of mosaiced channels. The errors reveal the optimal number of channels to be around 5-6. Here, we use the MSFA and demosaicing method in [26].
Behavior of Imaging Noises. Adopting our model and optimization method, we can also explore how imaging noise affects optimal channel selection, design of MSFA, and demosaicing strategy via simulation. Figure 14 shows the optimized results under different imaging noise levels. It can be seen that, under higher noise levels, the optimized channel combination would have higher light throughput to preserve SNR of cameras (the observation is similar to previous works [13,20]). We also found that, in the presence of higher noise, the MSFA pattern tends to be more uniform, that is, the binary tree is more balanced. Furthermore, the demosaicing method tends to be more channel-independent. This is expected since, under high noise levels, the magnitude of noise is much larger than the demosaicing error, while channel-dependent demosaicing propagates noise between channels, although it can reduce demosaicing error. These conclusions provide guidelines to effectively reduce the search space in our optimization method.

Conclusions
In summary, we propose new error models for multi-spectral imaging and utilize the models to select optimal channel combination, the pattern of MSFA, and demosaicing order for multi-spectral imaging. We verified the effectiveness of our method and compared it with previous methods.
Our method can also be applied to other similar problems. For example, projection display is a dual imaging system, and, therefore, selecting few efficient primaries for a spectrally accurate display is a dual problem of our channel selection. In other areas, such as wireless sensor networking, where sensor or observation selection [50] is critical, our optimization might provide an effective solution in acceptable time.
In the future, we plan to apply our method to such varied domains, our immediate interest being in multi-spectral display. Moreover, we plan to explore the relationship between spectral sensitivity of tunable filter camera channels and accuracy of multi-spectral image registration.