1. Introduction
When the spectral sensitivities of the RGB sensors found in a camera and the XYZ color matching functions (CMFs) [
1] are exactly a linear transform apart, the camera is said to be colorimetric [
2]. When this colorimetric condition is met, the RGBs measured by the camera—for all and any spectral stimuli—are the same linear transform from the corresponding XYZ coordinates. If a camera’s spectral sensitivities are a linear transform from the XYZ CMFs, it satisfies the socalled
Luther condition [
3,
4].
Despite being suitable for color measurement, in general, cameras are rarely made to be colorimetric due to manufacturing and engineering considerations, including the need for low cost manufacturing and lownoise in the captured images [
5]. Moreover, for crossilluminant color measurement—where we wish to measure XYZs under different lights and then map the recorded tristimuli to a fixed reference illuminant, the best sensors are not linearly related to XYZ CMFs [
6,
7,
8].
It was recently proposed [
9,
10] that a color filter could be designed which, when placed in front of the camera, would make it more colorimetric, see
Figure 1 (including for the crossilluminant measurement case [
10]). In Finlayson et al. [
9], the filter is found that makes the camera best satisfy the Luther condition, that is, the filtered spectral sensitivities are as close as possible to being a linear
$3\times 3$ matrix correction from the XYZ CMFs. Curiously, the Luthercondition optimization is dependent on the target CMFs. If, for example, the best filter is found for the sRGB [
11] sensor basis (a linear combination from the XYZ CMFs such as the cone mechanisms [
12]), then a different optimal filter is found. That there are multiple optimal answers, perhaps, points to a weakness in the optimization statement?
In color science, the VoraValue is a single number in the interval of [0, 1] which reports how close a camera’s spectral sensitivities are to the human visual subspace (HVS) [
13]. When the VoraValue is 1, this means the camera is colorimetric; when it is 0, it means the camera is completely unsuited for the color measurement task. The VoraValue formula is, by construction, independent of the bases that describe the camera and the HVS [
14]. In effect, the VoraValue measures how close the vector spaces—respectively spanned by linear combinations of the camera spectral sensitivities and XYZ CMFs—are to one another. Finlayson and Zhu [
15] employed the VoraValue as an optimization criterion to solved for the optimal filter for a given camera.
From an algorithmic viewpoint, the Luther and VoraValue optimizations are quite different from one another. The Luther condition optimization [
9] is solved using a simple alternating leastsquares procedure which converges quickly. The VoraValue optimization [
15], however, has a more complex derivation and is optimized by a gradient ascent approach. The VoraValue optimization converges less quickly, though arguably to a better overall answer (because of the basis independence property of the VoraValue).
The key result, which we present in this paper, is to prove that if we choose to optimize the Luther condition using not the XYZ CMFs but a linear combination which is orthonormal, then we are optimizing the VoraValue. This means we can use the quickertoconverge Alternating LeastSquares (ALS) optimization to find the best VoraValue filter. Now, we recast the priorart gradientascent VoraValue approach so that we can use Newton’s method [
16] (which includes 2nd derivatives) to speed up the optimization. Even compared to the faster Newton’s method, the simple ALS method has competitive performance, though it does converge a little more slowly.
However, an additional advantage of the ALS method is that it is easier to add constraints into the problem formulation than to do so for the VoraValue approach. Indeed, prior work [
17] has shown how we can bound both the smoothness of the derived filter and its transmission properties (e.g., that the filter has to transmit at least 40% of the light at every wavelength across the visible spectrum).
Experiments demonstrate that the VoraValue optimized color filter can dramatically increase the colorimetric accuracy across a large set of commercial cameras.
The rest of the paper is structured as follows. In
Section 2, we review the definition of the Luther condition and the VoraValue. There, we introduce the concept of orthonormal basis which will underpin our optimization methods. In
Section 3, we present the equivalence theorem relating the Luther and VoraValue optimizations. We also show how Newton’s method can be used to find the filter that maximizes the VoraValue. Experimental results are reported in
Section 4. The paper concludes in
Section 5.
3. Methods and Algorithms
In previous work [
9], it was shown how to find a filter that best satisfies the Luther condition under the assumption that we wish to find filtered camera sensors that best approximate XYZ CMFs. Further, the optimization was formulated using the Alternating LeastSquares paradigm. Below we reformulate the problem and make three important modifications. First we make an orthonormal basis set (linear combination) of the CMFs as the target. This step will help elucidate the similarity between optimizing a filter to best meet the Luther condition and to maximize the VoraValue. Second, we recast the optimization as a gradient ascent optimization (and study the Newton formulations). Third, to ensure the well foundedness of the formulation—and to bound the shape of the filter—we add an additional regularization term that constrains the filter.
3.1. The ModifiedLuther Condition Optimization
Physically, the effect of placing a transmissive filter, $\mathbf{f}$ (a $31\times 1$ vector), in front of a camera can be modeled as $diag\left(\mathbf{f}\right)\mathbf{Q}$ where $diag\left(\right)$ is a matrix operation that turns a vector into a diagonal matrix. To ease notation, we define $\mathbf{F}=diag\left(\mathbf{f}\right)$ and write $\mathbf{FQ}$. When we wish to extract $\mathbf{f}$ from $\mathbf{F}$, we write $\mathbf{f}=ediag\left(\mathbf{F}\right)$ (‘e’ signifies to ‘extract’ the diagonal).
In Equation (
9) we mathematically formulate the filtermodified Luthercondition optimization:
where the subscript
${}_{F}$ denotes the Frobenius norm. Note that here we do not constrain the filter to be positive although a physical filter should have positive transmittance values. We remove this constraint because we found that, empirically, our method always returned an allpositive filter solution. Here we use the symbol
$\mu $ to denote the objective function with the regularization term. Clearly,
${\Vert \mathbf{FQM}\mathbf{V}\Vert}_{F}^{2}$ is the core optimization: we wish to find
$\mathbf{F}$ (the filter) such that the filtered sensitivities are approximately a
$3\times 3$ linear transform
$\mathbf{M}$ from the orthonormalized CMFs, here
$\mathbf{V}$. The second term in the optimization is the squared norm of the filter where
$\alpha \ge 0$ is a penalty term defined by the user. As
$\alpha $ becomes large, the optimization must effectively choose a filter whose norm is smaller. Small norm filters tend to be smoother and less jaggy. We say that the scalar
$\alpha $ regularizes the optimization.
The regularization term in the optimization makes it tilt towards a filter with lower transmittance values over those with higher transmittances. Indeed, a filter that has a value of 1% at every wavelength has a lower norm than one which transmits 100% of the light at every wavelength. However, we argue that this is not a problem. Remember that in the core optimization, $\mathbf{F}$ and $\mathbf{M}$ are recovered up to a scaling factor, that is, $\alpha \mathbf{F}$ and $\frac{1}{\alpha}\mathbf{M}$ is an equally good solution. Viewed through the lens of this scaling indeterminacy, the filter that reflects 1% of light is equally efficacious as the one that transmits 100%. Pragmatically, because we know this scaling indeterminacy exists, we always scale the optimized filter so that its maximum transmittance value is 100%.
Importantly, the regularization term usefully pushes the optimization towards recovering smoother filters. A very jaggy filter with an average transmittance of (say) 50% has a much higher norm than one that has a uniform 50% transmittance across the spectrum. Jaggy filters are harder to manufacture so penalizing their discovery in the optimization makes sense.
In
Section 3.5, we show how we can add more explicit constraints on the shape and transmittance properties of the filters that we design.
3.2. The Equivalence of the ModifiedLuther and VoraValue Optimizations
We begin by considering the special case of
$\alpha =0$ in Equation (
9):
Suppose we are given the filter matrix
$\mathbf{F}$, then, employing the closedform MoorePenrose inverse solution to the leastsquares [
35], the best
$\mathbf{M}$ is obtained
Here, we see that the best linear mapping
$\mathbf{M}$ in the Luthercondition optimization of Equation (
9) is essentially a function of the unknown variable matrix
$\mathbf{F}$. By multiplying this newly solved
$\mathbf{M}$ with
$\mathbf{FQ}$, we have
That is, we premultiply the orthonormal basis for the XYZs,
$\mathbf{V}$, by the projector for
$\mathbf{FQ}$ (see Equation (
5) for the definition of projector matrix).
Substituting into Equation (
10), the optimization can be rewritten as
where
$\mathbf{I}$ is the identity matrix. Equation (
13) has the advantage that the optimization depends only on the filter
$\mathbf{F}$.
Theorem 1. By minimizing $\underset{\mathbf{F}}{arg\; min}{\Vert \left(\mathrm{P}\right\{\mathbf{FQ}\}\mathbf{I})\mathbf{V}\Vert}_{F}^{2}$, we maximize $v(\mathbf{FQ},\mathbf{X})$.
Proof of Theorem 1. We will use the following axioms:
$trace\left({\mathbf{A}}^{T}\mathbf{A}\right)={\left\right\mathbf{A}\left\right}_{F}^{2}$
$trace\left(\mathbf{AB}\right)=trace\left(\mathbf{BA}\right)$
$trace(\mathbf{A}+\mathbf{B})=trace\left(\mathbf{A}\right)+trace\left(\mathbf{B}\right)$
$\mathrm{P}\left\{\mathbf{A}\right\}\mathrm{P}\left\{\mathbf{A}\right\}=\mathrm{P}\left\{\mathbf{A}\right\}$, idempotency of projectors
$\mathrm{P}\left\{\mathbf{A}\right\}={\left(\mathrm{P}\left\{\mathbf{A}\right\}\right)}^{T}$, the projector is symmetric.
$trace\left(\mathrm{P}\right\{\mathbf{A}\left\}\right)=rank\left(\mathbf{A}\right)$.
Using axiom 1, we rewrite the formula in Equation (
13) as
Now, we carry out some algebraic manipulation of the argument of Equation (
14). Using axiom 2 for moving
${\mathbf{V}}^{T}$ to the other side and known from Equation (
7),
$\mathrm{P}\left\{\mathbf{X}\right\}={\mathbf{VV}}^{T}$, we rewrite
Manipulating the above expression using the properties of projector matrix (axioms 3 through 5), we obtain
Using axiom 6, clearly in Equation (
16),
$trace\left(\mathrm{P}\right\{\mathbf{X}\left\}\right)$ is a positive constant given the known matrix
$\mathbf{X}$. Therefore, we minimize the derived expression by maximizing
$trace\left(\mathrm{P}\right\{\mathbf{FQ}\left\}\mathrm{P}\right\{\mathbf{X}\left\}\right)$. Thus, we can write:
From the definition of VoraValue, we know
$\nu (\mathbf{FQ},\mathbf{V})=\frac{1}{3}trace\left(\mathrm{P}\left\{\mathbf{FQ}\right\}\mathrm{P}\left\{\mathbf{X}\right\}\right)$. We conclude:
☐
In summary, we have shown that a leastsquares procedure that finds a filter that—in combination with a linear leastsquares mapping (see Equation (
10)–(
13))—best fits an orthogonal basis of the color matching functions (i.e., minimizes the fitting error) must simultaneously maximize the VoraValue.
3.3. Gradient Ascent for Solving the Optimization
So far we have concerned ourselves with formulating the filter design problem, that is, writing down what it is we wish to solve for. Now, we wish to present the mechanics of how we optimize for the filter. Given we have shown the equivalence for maximizing the VoraValue and minimizing the modified Luther condition error, there is only one optimization at hand. The ALS method—for optimizing the Luther condition—is well described in Reference [
10]. Henceforth we will concern ourselves with maximizing the VoraValue (safe in the knowledge that the filter that is found also optimizes the modified Luther condition problem) in this section. Moreover, presenting the gradient method is a necessary step for us to present a much faster algorithm: the Newton’s method [
16].
Since we are trying to maximize the VoraValue, we will adopt a gradient ascent procedure. For a given filter, we can calculate its VoraValue. The set of all VoraValues can be thought of as a hilly landscape. By calculating the derivative (how the VoraValue changes) when the filter is modified in certain directions, we can choose to take a small step in a direction that increases the VoraValue.
By choosing to modify the filter stepwise—at each step seeking to go further up the hill—we modify the filter until we are on the hill top. Of course there is nothing in the gradient ascent method that guarantees that we will find an optimal solution to the filter design problem. At minimum, we will certainly, arrive at a filter that is better than the initial guess (e.g., the 100% donothing filter $\mathbf{f}={[1,1,\cdots ,1]}^{T}$ when the filter is fully transmissive). And, as we present later, we find a very good filter to our problem.
Remember that the VoraValue is defined to be the trace of the projector of the filtered sensitivities premultiplying the projector for the XYZ CMFs. So we wish to maximize:
We maximize
$\nu $ in the usual calculus way by first finding the derivative of the VoraValue function and then updating
$\mathbf{F}$ by moving in the direction of the gradient (and in so doing we increase
$\nu $). It can be shown—details given in Reference [
36]—that the derivative of
$\nu $ calculated with respect to
$\mathbf{F}$ is calculated as:
Equivalently, the gradient of the objective function in terms of the underlying filter vector—remember
$\mathbf{f}$ (
$=ediag\left(\mathbf{F}\right)$)—can be written as
where the gradient with respect to the filter vector,
$\nabla \nu \left(\mathbf{f}\right)$, is a
$31\times 1$ vector. It is evident that the gradient function has a very interesting structure: it is the diagonal of the product of three projection matrices multiplied by the inverse of the filter (at hand).
The simple Gradient Ascent algorithm for finding the filter solution of the optimization is shown below (see Algorithm 1). Note that we use the term ‘gradient ascent’ (but not the more familiar term ‘gradient descent’) because we are maximizing our objective function. The algorithm works by updating the current filter at step
k to a new filter at step
$k+1$. At step
k we calculate the gradient and then update according to:
${\mathbf{f}}^{k+1}={\mathbf{f}}^{k}+{t}^{k}\nabla \nu \left({\mathbf{f}}^{k}\right)$. The term
${t}^{k}$ is the step size (a small positive value) which controls how far the filter should move towards the direction of the gradient. The simplest approach would be to set
${t}^{k}$ to a small fixed number. But, in this case, the optimization will take a long time to converge. To make convergence quicker, we adaptively choose the stepsize. In our work we find the best step
${t}^{k}$ per iteration using the line search method. We refer the reader to Reference [
37] for more details.
As we will see in the next section, even when we choose the step size judiciously, convergence is still relatively slow. A common technique to find the filter more quickly is the Newton’s method (which involves calculating 2nd derivatives). A naive implementation of the Newton’s method quickly runs into numerical stability problems and one way to side step these is to adopt the regularized formulation of the VoraValue formulation. Remembering our theorem linking the Luthercondition method to the VoraValue, we can rewrite the maximization where we reinclude the penalty term on the norm of the filter as a minimization:
It follows that the gradient function of the regularized optimization (denoted by
$\mu $ in Equation (
9)) as
Clearly,
$\nabla \mu \left(\mathbf{f}\right)=3\nabla \nu \left(\mathbf{f}\right)+2\alpha \mathbf{f}$. In the next section we present the Newton’s method for finding the regularized filter and use the corresponding gradient from Equation (
23).
Algorithm 1 Gradient Ascent for solving the optimization. 
 1:
$k=0,\phantom{\rule{1.em}{0ex}}{\mathbf{f}}^{0}={\mathbf{f}}^{initial}$  2:
repeat  3:
Compute the gradient: $\nabla \nu \left({\mathbf{f}}^{k}\right)$  4:
Choose a step size: ${t}^{k}>0$  5:
Update ${\mathbf{f}}^{k+1}:={\mathbf{f}}^{k}+{t}^{k}\nabla \nu ({\mathbf{f}}^{k}$)  6:
$k\leftarrow k+1$  7:
until$max\left(\right\nabla \nu \left({\mathbf{f}}^{k}\right)\left\right)\le \eta $  8:
return${\mathbf{f}}^{k}$.

3.4. Newton’s Method for Solving the Regularized Optimization
Often simple gradient ascent is slow to converge. A better way to tackle the optimization is to adopt the Newton’s method when the second derivative (i.e., the Hessian matrix) of the objective function can be computed. Newton’s method uses the curvature information (second derivative) of the objective function and generally is faster to converge, typically it has a quadratic convergence speed compared to linearly for gradient ascent/descent. When a neighborhood of the optimal solution is reached, often in just a few iterations a solution with adequately high accuracy can be achieved. Describing the details of how the Newton’s method works is beyond the scope of this paper—but see Reference [
37] for a review—we will rather describe how it is used and report the Hessian we calculate for the filter design problem.
The Hessian for the filter design problem is a
$31\times 31$ matrix (assuming 31 sample wavelengths) and is denoted
${\nabla}^{2}\mu $. The term
${\nabla}_{ij}^{2}$ denotes the partial derivative
$\frac{{\delta}^{2}\mu}{\delta {\mathbf{f}}_{i}\delta {\mathbf{f}}_{j}}$. Again with the derivation given in Reference [
36], the Hessian for our problem is calculated as:
where ∘ denotes the Hadamard product (or elementwise product) of two matrices. In the equation, the last term relates to the regularization term. Because we are adding a nonzero multiple,
$\alpha $, to the identity matrix, we prove in Reference [
36] that this ensures the Hessian to be positive definite. Because the inverse of Hessian drives the Newton’s method, it is essential the Hessian has full rank.
Newton’s method for the problem of filter design is encapsulated in Algorithm 2. Here, given an initial guess
${\mathbf{f}}^{initial}$, we choose a stepsize after calculating the gradient and Hessian. Again stepsize is tackled using the backtracking line search procedure [
16]. Later we update the filter solution by using its gradient and Hessian:
${\mathbf{f}}^{k+1}:={\mathbf{f}}^{k}{t}^{k}{\left[{\nabla}^{2}\mu \left({\mathbf{f}}^{k}\right)\right]}^{1}\nabla \mu \left({\mathbf{f}}^{k}\right)$. The algorithm will keep refine the filter until a stopping criterion is met. In this paper, we simply use an empirically determined maximum iteration number (e.g., 1000) when the algorithm is well converged.
Algorithm 2 Newton’s method for solving the optimization. 
 1:
$k=0,\phantom{\rule{1.em}{0ex}}{\mathbf{f}}^{0}={\mathbf{f}}^{initial}$  2:
repeat  3:
Compute the gradient and Hessian: $\nabla \mu \left({\mathbf{f}}^{k}\right)$ and ${\nabla}^{2}\mu \left({\mathbf{f}}^{k}\right)$  4:
Choose a stepsize: ${t}^{k}>0$  5:
Update ${\mathbf{f}}^{k+1}:={\mathbf{f}}^{k}{t}^{k}{\left[{\nabla}^{2}\mu \left({\mathbf{f}}^{k}\right)\right]}^{1}\nabla \mu \left({\mathbf{f}}^{k}\right)$  6:
$k\leftarrow k+1$  7:
until stopping criterion is satisfied  8:
return${\mathbf{f}}^{k}$

As we will see in the next section the Newton’s method converges much more quickly than gradient ascent. However, the cost of implementation is significant. Here the $31\times 1$ gradient and $31\times 31$ Hessian must be calculated at each step in the iteration. More onerously the inverse of the Hessian must also be calculated (a computationally nontrivial operation).
3.5. Adding Constraints to the Filter Optimisation
By default the filter found by using the algorithms proposed in the previous subsections can be arbitrarily nonsmooth (even with the regularization term) and it might also be very nontransmissive (see
Figure 3). Nonsmoothness can limit the feasibility of filter fabrication. Moreover, it is perhaps unsettling that we do not (and cannot) predict how filter smoothness relates to the regularization penalty term. As well as being smooth, we may also wish a color filter to transmit at least a lower bound percentage of the light that passes though. Indeed, a filter with too low transmittance values would perforce, have limited practical utility.
In this paper, we have shown how we can maximize the VoraValue either by employing a gradient ascent algorithm or, equivalently, by adopting a simple Alternating LeastSquares (ALS) algorithm where we seek to find the filter such that the filtered sensitivities are close to being linearly transformable to an orthonormal basis of the human visual system sensitivity (e.g., a linear combination of the XYZ CMFs which are orthonormal). In previous work [
17], we have shown how the ALS approach can be modified—using Quadratic Programming (QP) [
38]—to incorporate smoothness and transmittance constraints.
While the QP solution is not the focus of this paper, we will include, for interest, an example of the constrained optimization in the next section. We refer the reader to Reference [
17] where details of the constrained ALS formulation are presented.
5. Conclusions
In this paper, we presented a unifying filter design method based on the underlying relation between the Luther condition and the VoraValue. Mathematically, we prove that for the filter design problem, these two approaches can be well unified by using the orthonormal basis (a special basis set linearly transformed from the CMFs). That is, the optimal filter targeting at these two objectives can be simultaneously achieved. Hence, we can find an optimized Lutherfilter that in the leastsquares sense, also optimizes the VoraValue. We have used three algorithms to solve for the best filter solution, that is, alternating leastsquares (ALS), gradient ascent and Newton’s method; the latter of which has a faster convergence speed. We also find that the ALS approach has competitive performance compared to Newton’s method but has a much simpler formulation.
Experiments validate our method. For all cameras tested we find the VoraValue is significantly increased compared to the cameras without filtering. In the color measurement test, we demonstrate that by using our optimized filters, we find the filtered sensitivities of digital cameras improve significantly in color accuracy compared to the prior arts.