On the Optimization of Regression-Based Spectral Reconstruction

Spectral reconstruction (SR) algorithms attempt to recover hyperspectral information from RGB camera responses. Recently, the most common metric for evaluating the performance of SR algorithms is the Mean Relative Absolute Error (MRAE)—an ℓ1 relative error (also known as percentage error). Unsurprisingly, the leading algorithms based on Deep Neural Networks (DNN) are trained and tested using the MRAE metric. In contrast, the much simpler regression-based methods (which actually can work tolerably well) are trained to optimize a generic Root Mean Square Error (RMSE) and then tested in MRAE. Another issue with the regression methods is—because in SR the linear systems are large and ill-posed—that they are necessarily solved using regularization. However, hitherto the regularization has been applied at a spectrum level, whereas in MRAE the errors are measured per wavelength (i.e., per spectral channel) and then averaged. The two aims of this paper are, first, to reformulate the simple regressions so that they minimize a relative error metric in training—we formulate both ℓ2 and ℓ1 relative error variants where the latter is MRAE—and, second, we adopt a per-channel regularization strategy. Together, our modifications to how the regressions are formulated and solved leads to up to a 14% increment in mean performance and up to 17% in worst-case performance (measured with MRAE). Importantly, our best result narrows the gap between the regression approaches and the leading DNN model to around 8% in mean accuracy.


Introduction
A consumer RGB camera captures light signals with three types of color sensors. Yet the light is physical radiance-a continuous spectral function of wavelength-which, intuitively, can hardly be described by a 3-dimensional color representation. Indeed, many researchers have found that real-world spectra should be at least 5-to 8-dimensional [1][2][3][4][5]. Consequently, with RGB imaging we can only acquire limited information encoded in the light spectrum.
Historically, SR is most efficiently solved by Linear Regression (LR) [52], where the map from RGBs to spectra is described by a simple linear transformation. Considering a nonlinear map, the Polynomial Regression (PR) [53] and Root-Polynomial Regression (RPR) [56] methods expand the RGBs into a set of polynomial/root-polynomial termswhich are then mapped to spectra via a linear transform. More recent regression models, including the Radial Basis Function Network (RBFN) [54] and the A+ sparse coding algorithm [55], use clustering techniques to define the local neighborhood (in the color or spectral space) in which each RGB is regressed.
On the other hand, we see most of the recent SR algorithms are not based on simple regressions but based on Deep Neural Networks (DNN) [60][61][62][63][64][65][66] which embrace the idea of regressing RGB image patches as a whole (as oppose to regressing each pixel independently in regressions). This approach hypothesizes that object-level descriptions of the RGBs can-though requiring much more computational resources-aid the recovery of spectra. However, DNN-based models do not always perform better than simple regressions [55] and often suffer from instability issue when recovering spectra at different brightness scales [56][57][58]. Furthermore, spectra recovered by DNNs are shown to be less accurate in color [57,59] and do not necessarily provide more accurate cross-illumination or crossdevice color reproductions [57].
In this paper, we aim to further optimize the existing regression-based SR methods. First, we noticed that in recent works, DNN-based models are evaluated and ranked by Mean Relative Absolute Error (MRAE) [63,64]. Most top DNN models are also designed to minimize MRAE directly [60][61][62][63][64]. However, all regressions used in SR are optimized by convention for Root Mean Square Error (RMSE) while evaluated using MRAE (or other similar relative errors) when compared with the DNNs [55][56][57]60,61]. In other words, these regressions are optimized for one metric, and then how well they work is evaluated with another.
In Figure 1, we illustrate the standard experimental framework of SR. In training, the parameters of the SR model are tuned such that the "losses"-the differences between the ground-truths and estimations measured by a given loss metric-are statistically minimized. After the SR models are trained, we evaluate them based on a desired evaluation metric. Ideally, the loss and evaluation metrics should match (i.e., the same or similar in nature). Indeed, a model that is optimized for one metric but evaluated by another will surely lead to sub-optimal results. Based on this insight, we propose two new minimization approaches for simple regressions-the Relative Error Least Squares (RELS) and Relative Error Least Absolute Deviation (RELAD). While the former minimizes an error similar to MRAE and is solved in closed form, the latter explicitly minimizes the MRAE metric but has the disadvantage of requiring an iterative minimization.
The second contribution of this paper is to propose a new way of regularizing the regression-based spectral reconstruction. Most regressions are necessarily trained using a regularization constraint [67], both to prevent overfitting and to make the system equations more "stable" [68] (a system of equations is stable if small perturbations appear in the training data results in a small perturbation in the solved-for model parameters). However, we observe that hitherto in regression-based spectral reconstruction all spectral channels are regularized altogether-e.g., in [52][53][54][55][56]. That is the regularization constraint is effectively applied at the spectrum level. Yet, fundamentally, error metrics such as MRAE measure the errors at all spectral channels independently and then average them to give a spectral error measure. Thus, we propose a "per-channel" regularization methodology which ensures optimized regularization for each spectral channel independently.
Combined, we find that training the simple regressions to minimize the same error as used in testing and adopting a per-channel regularization approach lead to a significant uptick in performance. Moreover, as shown in the example hyperspectral image reconstruction results in Figure 2, our new regression methods deliver performance that is similar to one of the best (but much more complex) DNN approaches. The rest of the paper is organized as follows. In Section 2, we will introduce the prerequisites of regression-based spectral reconstruction. Our new methods are introduced in Section 3. Section 4 details the experimental procedures, and Section 5 discusses the experimental results. This paper concludes in Section 6.

Regression-Based Spectral Reconstruction
A simple mathematical model of RGB formation is written as [69] where r(λ) denotes the radiance spectrum, s(λ) = [s R (λ), s G (λ), s B (λ)] T is the set of three spectral sensitivities of the color sensors, and x = [R G B] T is the derived color vector (the superscript T denotes the transpose operator). In the case of color imaging, the range of integration, Ω, is the visible range. In this paper, we consider this range to be the interval from 400 to 700 nanometers (nm). Further, we assume r(λ) and s(λ) are measured by a hyperspectral device at every 10 nm (in accordance with the recent SR challenge datasets [63,64]). As such, we can approximate the integration in Equation (1) by inner products [69]: where r = [r(400), r(410), · · · , r(700)] T is the 31-component vectorized spectrum, and S is the 31 × 3 spectral sensitivity matrix (one spectral sensitivity per column).
In SR, we study the inverse problem: we wish to recover r using the information of x. A general definition of SR (considering both regressions and DNNs) can be written as where r k denotes the hyperspectral data at the location of interest (the kth pixel), Ψ denotes an SR algorithm, and the neighborhood set N k includes the RGBs in the spatial neighborhood of pixel k (the subscripts k,l indexing the lth pixel in the proximity of pixel k). Note that N k also contains the RGB at the location of interest i.e., x k . In regression we normally do not use the neighborhood information to recover spectra (though small neighborhoods are occasionally admitted [55,56]). Rather, we attempt to map each RGB uniquely to a corresponding spectrum, i.e., Ψ(x k ) ≈ r k .

Linear Regression
The simplest Linear Regression (LR) approach [52] models the SR algorithm Ψ by a linear transformation: where M T is a 31 × 3 matrix.
Considering the whole database of N spectra and their matching RGBs (regardless of which images they are in and their pixel locations), let us arrange them into the N × 31 spectral data matrix R = [r 1 , r 2 , · · · , r k , · · · , r N ] T and the matching RGB data matrix X = [x 1 , x 2 , · · · , x k , · · · , x N ] T . Based on this nomenclature, we can rewrite Equation (4) as That is, all spectra (the rows of R) are estimated from their corresponding RGBs (the rows of X) using the same linear transformation matrix M.

Nonlinear Regressions
The regression problem summarized in Equation (5) is, by construction, linear. In a general case where we may wish to pose the SR problem as a non-linear regression, we might transform each row of X (each RGB) to an s-component counterpart using a given nonlinear function ϕ : R 3 → R s : This N × s transformed RGB matrix is then regressed to estimate the spectral data R using a new linear transformation matrix M ϕ : Here, the dimension of M ϕ is s × 31.
For instance, if we consider the 2nd-order RPR, we will have T , in which case X ϕ is an N × 6 matrix and the regression matrix M ϕ is 6 × 31 [56].

A+ Sparse Coding
So far, both linear and nonlinear regressions assume a single regression matrix that is used to estimate spectra for all RGBs. Yet in A+ sparse coding (i.e., adjusted anchored neighborhood regression) [55], it is considered that the SR mapping should be done locally [55,59]. Fundamentally, in sparse coding one assumes that all spectra can be represented by linear combinations of a small number of anchor spectra (hence it is "sparse") [55,70].
In training, the A+ algorithm runs the K-SVD clustering algorithm [71] and assigns the clusters' centers as the set of anchor spectra. Around the corresponding RGB value of each anchor, a fixed number of nearest RGB neighbors in the training dataset and their corresponding radiance spectra are recorded. These neighboring data points are then used to create a bespoke Linear Regression mapping (Equation (5)) that should be only used by the RGBs in the testing set that are around the same neighborhood [59]. That is, in the reconstruction stage of A+, we first find the anchor spectrum whose matching RGB is closest to the query RGB, and second we map the query RGB to spectrum using the regression mapping associated with that neighborhood.
We note that all the discussed regression models can be formulated by either Equations (5) or (7), while the structure of both are essentially the same. Thus, to simplify the notation we will adopt the regression notation of Equation (5), i.e., we drop the subscript ϕ of Equation (7). As such, in later discussions the data we are regressing can be the RGBs or their non-linear expansions thereof.

Least-Squares Minimization
Now let us consider how we solve for the regression matrix M. In essence, we must define what we meant by the ≈ symbol in Equation (5).
Most commonly, we seek to minimize the "sum-of-squares" between the estimations and ground-truths: min Here, || · || 2 2 calculates the sum of all components squared. Equivalently, || · || 2 2 is sometimes written as || · || 2 F denoting the Frobenius norm. Regressions solved in this manner are called the Least-Squares (LS) regression. Advantageously, the solution of LS minimization can be solved in closed form [72].

The Overfitting Problem and Regularized Least Squares
However, the solution of Equation (8) often cannot be directly used in practice-given that the solved regression matrix M is optimized for minimizing the errors in the training set, we often find that this matrix does not work for unseen data. In other words, denoting a set of unseen RGB and spectral data as X and R , we often get X M ≈ R . This problem is so-called "overfitting" [67,68].
One of the biggest problems for an overfitted regression is that it might not work in the presence of noise. As a thought experiment, we may consider to recover R (the same set of training spectral data) from the "perturbed" training RGB data, X = X + , where is a matrix of very small numbers, representing the noise occurs in the RGB imaging process. It follows that an overfitted M can very possibly suggest X M = [X + ]M ≈ R, that is, it fails to plausibly recover R.
Another facet of this problem is that if we actually attempt to find the best regression matrix for the noisy data [X + ], we can end up having the optimal regression matrix that is very different from the original one. That is, as we perturb our RGB data, we can arrive at very different regressions.
To mitigate the overfitting problem, the tool of ridge regularization (a.k.a. Tikhonov regularization) [67] is often incorporated when training a regression. While solving the minimization in Equation (8), we add another penalty term which bounds the magnitude of the regression matrix M: where the solution of M can still be written in closed form [52].
Here, the γ parameter-which is set by the user-controls how ||M|| 2 2 mitigates the minimization of the sum-of-squares fitting error. Clearly, if γ = 0, the equation resorts to the simple LS (Equation (8)), and as γ becomes large, the need to solve for an M that has a small (bounded) norm becomes imperative and the requirement to reduce the sum-of-squares fitting error is less important.
When the solved M has a bounded norm, the regression has the stability property we desire. That is, if we perturb X by a small amount we will still get the same (or very similar) M, and this in turn implies that albeit the perturbation in the input RGBs we will still get similar spectral estimations. We refer the readers to the work in [67] for a fuller discussion of how regularization is used and why it solves the instability problem.
In practice, we choose the γ parameter empirically to best trade-off the need for a stable solution and to lower the fitting spectral errors. Figure 3 illustrates a typical crossvalidation parameter selection methodology [73]. First, a wide range of different γ's are tried, and the solved M's depending on these γ values are used to recover spectra in an unseen "validation set", which are then evaluated using the desired evaluation metric (here, we use the MRAE metric). Usually, a "U-shaped" curve is obtained when plotting the averaged validation-set MRAE against γ, where the minimal point (the red dot in the plot) indicates the selected parameter.
We note that the regularization parameter (the red dot in Figure 3) is not "fixed", that is we search for independent regularization parameter in every minimization instance encountered. While the ridge regularization approach introduced here (adding an 2 penalty term) is the most widely used approach (not only for SR but for learning problems in general), there are other regularization approaches used in the literature, including the LASSO [74] (where the 1 penalty term is used) and Elastic Net approach [75] (where both 1 and

RMSE versus MRAE Error Measures
In the prior art of SR, most (if not all) regression-based methods are based on LS minimization with a ridge regularization setting (see, e.g., in [52][53][54][55][56]). As the LS minimization uses sum-of-squares as its loss metric (see Equations (8) and (9)), a natural metric for evaluating LS-based regressions is Root Mean Square Error (RMSE): (because the sum-of-squares is essentially the sum of squared RMSE). Here, we denotê r = Ψ(x) as the estimation (returned by, e.g., regressions or DNNs) of the ground-truth r.
Of course, the physical radiance spectra may have greater or lesser magnitudes depending on the imaging conditions [56]. However, if we model the change in brightness by a scalar k, we get RMSE(kr, kr) = kRMSE(r, r). Which means, for a ground-truth spectrum that is k times brighter than r, when compared with the recovered spectrum also scaled by k (i.e., kr), we will get k times larger RMSE than the original case.
This biased nature of RMSE evaluation can as well influence the training process of regressions. Indeed, we can expect that the standard LS minimization can overestimate the squared-RMSE (i.e., sum-of-squares) losses of the bright spectra and consequently place more importance on minimizing their errors compared to the dim ones.
This brightness bias is part of the motivation underpinning the Mean Relative Absolute Error (MRAE) metric for spectral recovery [63,64]. This MRAE metric almost universally adopted in the recent literature is defined as where the spectral difference is divided by (component-wise) the ground-truth spectral values. As such, the effect of brightness level is discounted when measuring spectral differences, i.e., MRAE(kr, kr) = MRAE(r, r).
Arguably, the MRAE makes more sense as a performance metric as the SR algorithm might be used where the absolute/maximum intensity of the tested spectra is not known or controlled. Therefore, the main goal of this paper is to consider how to modify the SR regression to minimize a relative recovery error (in general and MRAE in particular).
We would also like to emphasize that the development of our new methods (presented in the following sections) are, of course, tailored to the assumption that MRAE is used as the evaluation error metric. Moreover, again, MRAE is certainly the most commonly used metric adopted for DNNs in the literature at this time, see, e.g., in [60,61,63,64].

Rethinking the Optimization of Regression for Spectral Reconstruction
The first part of our contributions concerns the regularization method used in the standard case (Equation (9)), where the regularization takes place at the spectrum level with all spectral channels being regularized together. Here, we will argue-and develop the requisite mathematics-that the regularization should be done per spectral channel.
In our second contribution, we alter the form of the regression. Following that recently MRAE is used to evaluate and rank new methods, and that this metric is optimized directly in modern DNN solutions but not in regression-based methods, we reformulate the regressions to optimize for fitting errors that are "relative" in nature (spectral differences relative to the ground-truth values). We develop solutions based on both 2 and 1 relative error minimizations.

Per-Channel Regularization
Returning to the conventional regression formulation in Equation (5), let us split the regression matrix M and the ground-truth spectral data R by columns: Here, ρ i denotes the ith column of R. Remember that each row of R is a single radiance spectrum, thus the numbers in ρ i are the ith-channel spectral intensities of all the spectra in the database. Now, we observe that the original multivariate regression is in fact a collection of 31 independent regressions: Notice that Equation (13) is equivalent to the original formulation in Equation (5), only that it explicitly shows there is no "inter-channel" dependence. In other words, for all regression-based SR in the literature, it has been always the case that each column of M is only used for recovering the corresponding column of R while irrelevant to the recoveries for other spectral channels.
Curiously, as we solve for M using the standard LS minimization in Equation (9), the strength of the penalty term, γ||M|| 2 2 , is controlled by a single parameter γ, meaning that all columns of M, that is, the 31 m i 's, are regularized using the same γ parameter despite the fact that each of them works independently of others. Essentially, by regularizing M as a whole, we are "asserting" such an interdependence among channels.
From a mathematical viewpoint (regarding how the regression is formulated), it makes more sense to regularize each per-channel regression independently. Following Equation (13), we rewrite Equation (9) in a per-channel fashion: Here, the per-channel regularization parameter γ i (again to be optimized by a crossvalidation procedure, see Section 2.2.1) is used specifically for regularizing the regression of the ith spectral channel. That is, we select different regularization parameters for different spectral channels.
We would like to remark that, although our per-channel approach matches the assumption made by the regression's formulation (that there is no inter-spectral-channel dependence), we shall admit the possibility that there might be better ways to formulate the regression which factors in "reasonable interdependence" between channels. For example, we may consider to impose a "smoothness" constraint used in the literature (see, e.g., in [76]) on the recovered spectra, though we note that this assumption would be more important for the recovery of "reflectance" spectra (which are intrinsically smooth) instead of the "radiance" spectra we are recovering (because the illumination spectrum is part of the radiance, which can occasionally make the radiance far from smooth, especially for indoor illuminations).
The solution of Equation (14) can be written in closed form [67]: Here, the I matrix is the identity matrix-if Linear Regression (Equation (5)) is used, then I is 3 × 3; otherwise, if a nonlinear transform is incorporated (Equation (7)), I is s × s where s is the dimension of the nonlinear feature vectors.

Relative-Error Least Squares Minimization
Now, let us minimize an error that is more similar to MRAE. First, we consider to formulate an 2 -minimization problem so as to ensure a closed-form solution.
From Equation (5), we can remodel the approximation as the following minimization: where all the divisions are component-wise. Here, the square of an 2 relative error (referred to as the "relative-RMSE" in some works [55,70]) is minimized. We call this new minimization approach the Relative Error Least Squares (RELS). Equation (16) can be rewritten in a per-channel fashion: for i = 1, 2, · · · , 31 , where 1 is an N-component vector of ones. We can further define a matrix of RGBs weighted by the reciprocals of ρ i : where ρ i = [ρ i,1 , ρ i,2 , · · · , ρ i,N ] T . Using this nomenclature, let us again rewrite Equation (18) into Clearly, Equation (20) shows that RELS is in effect another least-squares problem which regresses H i to fit the vector 1.
Of course, we need to regularize this minimization by solving the following equation instead: min whose closed-form solution is written as [67,77]

Relative-Error Least Absolute Deviation Minimization
Finally, let us consider to minimize MRAE directly. Analogous to Equation (16), we are now going to solve the following minimization: Following the same derivation as Equations (16)-(21), we reach In the literature, regressions solved via an 1 minimization is called the Least Absolute Deviation (LAD) [78,79]. As here the MRAE we are minimizing is a relative error, we call this new approach the Relative Error Least Absolute Deviation (RELAD).
Notice that for the regularization penalty term in Equation (24) we also adopted an 1 norm, which refers to the LASSO regularization [74] (see Section 2.2.1).
Unlike RELS, RELAD does not have a closed-form solution. For a small amount of data, a globally optimal solution can be found using a linear-programming solution [78,80]. However, in our application the amount of data is large-where the Iterative Reweighted Least Squares (IRLS) [79] algorithm is more appropriate and is thus used here.
The IRLS process approaches RELAD minimization by repeatedly solving Weighted Least Squares (WLS) [81] while updating the weights on every iteration depending on the losses and mapping functions obtained in the previous iteration, until the solution converges.
The detailed algorithm is given in Algorithm 1. The iteration number is indicated by the superscript (t) . All min, division, and absolute operations shown in Algorithm 1 are component-wise to the vectors, while the median function in Step 8 and the mean functions in Step 11 calculate the median and mean of components of δ (t) (the resulting scalarσ in Step 8 is a preliminary estimate of the standard deviation of absolute losses commonly used in the literature, see, e.g., in [79,82]). Furthermore, the min functions used in Step 9 and 10 are to clip the reciprocal values at 10 6 so as to prevent overly large numbers. Finally, in Step 11, we set the tolerance = 0.00005 and the stopping iteration T = 20. (24)) by IRLS algorithm. 1:

Algorithm 1 Solving RELAD regression (Equation
Absolute losses of all data are initialized to infinity 3: t = 0 4: repeat 5: Closed-form WLS solution 7: Preliminary estimate of the standard deviation of δ (t) 9: The × operator is the scalar product 10: Return the converged m i

The Regression Models
The regression models we consider in this paper (introduced in Section 2.1) are listed in Table 1. For each of these regressions, we train the model under the 4 minimization criteria listed in Table 2. That is, we benchmark 5 models paired with 4 minimization approaches; but, irrespective of which minimization we are using, we evaluate the performance of all algorithms using MRAE.
All implemented codes are provided as the Supplementary Materials.

Preparing Ground-Truth Datasets
In this paper, we adopt the ICVL hyperspectral image database [70], which was the database used for the NTIRE 2018 Spectral Reconstruction Challenge [63]. Each spectral image from the database (200 in total) has a spatial dimension of 1392 × 1300-that is, approximately 1.8 M spectra per image-with 31 spectral channels (referring to 10-nm spectral samplings between 400 and 700 nm).
For each spectral image, we calculate the corresponding RGB image following Equation (2), using the CIE 1964 color matching functions [83,84] as the camera's spectral sensitivities S, i.e., the resulting RGBs are the CIEXYZ tristimulus values [84]. This setting corresponds to the "Clean Track" methodology of NTIRE 2018 and 2020 Spectral Reconstruction Challenge [63,64].

Training, Validation, and Testing
We follow a 4-trial cross-validation process. First, we randomly separated all images (pairs of hyperspectral and corresponding RGB images) into 4 data subsets-denoted as image subset A, B, C, and D; each consists 50 scenes. Then, in each trial, 2 subsets were used for training, 1 subset was used for validation, and 1 subset was used for testing, and the roles for each subset in different trials were permuted as follows: In the training process of each trial, the training set was used to minimize for either the LS, LS pc , RELS, or RELAD criteria.
Then, the validation set was used to determine the unknown regularization parameters following the procedure introduced in Section 2.2.1. More specifically, we first coarsely tried a wide range of magnitudes: γ or γ i = {10 −20 , 10 −19 , · · · , 10 20 }, and then selected the magnitude that returns the lowest averaged validation-set MRAE. Next, we conducted a finer search at 1000 points (equidistant on the log scale) between the two adjacent magnitudes of the coarsely selected one, which determines the final selection of γ or γ i .
For the deep learning-based HSCNN-R, the validation step is different. In HSCNN-R, we use the validation set to determine the termination epoch of the training [60]. The actual termination epochs adopted differ among the 4 cross-validation trials, but all of them are between 315 and 350 epochs.
Finally, we evaluated all models and minimization criteria using the testing-set images. The resulting statistics presented in the next section are the averaged testing results over the 4 cross-validation trials.

Mean and Worst-Case Performance
In Table 3, we present the mean statistics of per-image-mean and per-image-99percentile (worst-case) MRAE over the 200 tested images. Let us first look at the numbers in the first row, which are results of variations of LR (i.e., Linear Regression). The mean results (under the headline "Mean per-image-mean MRAE") show that LR trained using all of our three new approaches outperform the standard LS (least-squares), among which the RE-LAD method performs the best-returning 14% lower MRAE compared to LS. On the right side of Table 3 (headlined "Mean per-image-99-percentile MRAE"), we see that RELS-based LR provides 17% lower worst-case MRAE compared to using the standard LS. Likewise, observing all other regression models, we found that the best minimization criterion in terms of mean MRAE is either RELS or RELAD depending on the model.
To further examine the robustness of the best minimization approach against other tested approaches, we present the "paired" (or "dependent") two-sample Student's t-test scores [85] in Table 4. The t-test scores are calculated based on the observed 200 pairs of per-image-mean MRAEs delivered by two compared minimization criteria. For example, let us look at the top-left number of the table, where 9.46 is the t-test score when comparing the "Best" approach for LR (i.e., RELAD, which delivers the lowest mean per-image-mean MRAE for LR) versus the conventional LS minimization. In the same row, we also see the t-test scores comparing the Best approach with LS pc and RELS, respectively, but not RELAD (because RELAD is the Best criterion itself). In essence, the larger the absolute value of the t-test score, the more significant the two distributions are apart (i.e., the advantageous mean performance of the Best criterion has statistical significance). In our case, all tests have a degree of freedom of 200 − 1 = 199 (200 is the number of dependent sample pairs). Accordingly, with a "one-sided" hypothesis (as we only want to test if the mean of the Best criterion is smaller), a 5% level of significance (so-called the "p-value" in most literature) corresponds to a t-score of 1.65 (according to the t-distribution table in [86]). Evidently, we see that the t-scores of all our tests are greater than the 1.65 threshold. In a practical sense, this result suggests that for each regression model, using the corresponding best minimization criterion can consistently deliver the lowest per-image-mean MRAEs of all.
Still, it is somewhat counterintuitive that RELAD performs worse than RELS in some cases. Indeed, as RELAD directly minimizes MRAE, we might expect that RELAD would always provide the lowest mean MRAE. This outcome likely originates from the imprecision problem of IRLS specifically when solving LAD (i.e., the 1 Least Absolute Deviation) minimizations [79,87,88]. Another possible cause which might result in suboptimal optimization is that the IRLS algorithm might, in some cases, fail to converge after 20 iterations (remember that we set the stopping iteration T = 20 in Step 11 of Algorithm 1). We stopped iterating after 20 iterations to keep limit the computational complexity of the problem.
In contrast, it is not entirely surprising that RELS has better worst-case performance than RELAD (and consequently RELS is the best method for all tested regressions in terms of the worst-case MRAE). Indeed, it is well known that 1 minimizations tend to be less sensitive to outliers compared to their 2 counterparts [78,79]. Which means, in RELAD these 99-percentile pixels might be treated as outliers during training-i.e., minimization of these pixels are less important-and thus perform worse in testing.
Finally, it is shown that our best performing regression model-the RELS-based PRis only 8.7% worse than HSCNN-R in terms of mean MRAE. We remind the reader that HSCNN-R is one of the top models in the NTIRE 2018 challenge [63] in which all finalists are based on DNNs. Given that most of the reported challenge entries were much more than 10% worse than HSCNN-R under the MRAE evaluation, we can expect that some of our further optimized regression models can be on par with-or even better than-many DNN models in the challenge.
This result contradicts the assumption taken by many DNN approaches that mapping large image patches is necessary for achieving top performances-as all tested regressions in our experiment are pixel-based. Furthermore, regarding how much fewer model parameters the regressions use in comparison to the DNNs (e.g., the PR regression model has 2573 parameters, whereas HSCNN-R has approximately 10 7 parameters), it is surprising to see that regression methods can achieve comparable performance to the DNNs (let alone being better than some). As DNNs have millions of parameters, it is likely that the training data is insufficient to robustly optimize these parameters (the ICVL database [70] used in our study and in NTIRE 2018 challenge [63] is already one of the largest available hyperspectral image databases so far).

Computational Time
The training and reconstruction time of all tested models are presented in Table 5. We calculate the training time as the average time used for each cross-validation trial, and the reconstruction time as the average time used to reconstruct each tested image. Our hardware specification includes Intel ® Core TM i7-9700 CPU and NVIDIA ® GeForce ® RTX 2080 SUPER TM GPU. Note that the GPU is only used to run the training process of the HSCNN-R model. All testing (reconstruction) processes and the training of the regressions use solely the CPU.
Let us look at the training time (the left side of Table 5). First, we see that regardless of the regression model, the closed-form LS, LS pc and RELS approaches show absolute dominance over the iteratively solved RELAD approach and HSCNN-R for shorter training time. Then, although it appears that the RELAD minimization in general takes longer than training HSCNN-R, we remark that HSCNN-R is GPU-accelerated (which is highly based on parallel programming), yet our implementation of RELAD does not use any apparent parallel programming technique.
As for the reconstruction time results (the right side of Table 5), we see that all tested regression methods require much less time to reconstruct a spectral image compared to HSCNN-R. Additionally, we notice that the reconstruction time for regression models does not depend on the adopted minimization approach-this result makes sense because after all, no matter how we optimized the regression matrix, the reconstruction procedure is the same. Consequently, in the case that longer "offline" training time is permitted, the RELADoptimized regression can still support fast reconstruction as LS, LS pc , and RELS.

Brightness Dependence
Let us further investigate the performance discrepancy within an image. In Figure 4, we show for each regression approach the MRAE recovery error map of a given example scene. We see that RELS and RELAD tend to improve the spectral recoveries for the foreground objects particularly e.g., trees and grass in the LR results, flowers in the RPR results, leaves in the A+ results, bonsai pot in the RBFN results, and the green peppers in the PR results. Yet, it seems that in the background and/or highlight regions (e.g., sandy grounds and the blue sky), LS and LS pc in turn outperform RELS and RELAD. We observe that generally in scenes included in the ICVL database the foregrounds are dimmer than the backgrounds, therefore we presume the reason that LS and LS pc tend to recover spectra of the background and highlights more accurately (yet perform much worse in the foreground) is due to the brightness bias of least-squares minimization (see Section 2.3).
In Figure 5, we exhibit how the mean MRAE performance of each regression approach is related to the pixels' brightness. Here, "brightness" is defined as the 2 norm of the ground-truth spectrum. For each image we separated all pixels into 4 different brightness groups-0-25 percentile (the dimmest group), 25-50 percentile, 50-75 percentile, and 75-100 percentile (the brightest group). Then, for pixels belonging to the same brightness group (from all testing-set images) we calculated their mean MRAE, which is presented in the plots. Clearly, we observe that while LS and LS pc generally provide lower MRAE for the brightest group in an image (75-100%), RELS and RELAD demonstrate great advantages over LS and LS pc in the two dimmest groups (0-25% and 25-50%), which eventually leads to better overall performance.

Conclusions
Spectral reconstruction (SR) recovers high-resolution radiance spectra from RGB images. Many methods are regression-based, with simple formulations and usually closedform solutions, while the current state-of-the-art spectral recovery is delivered by the much more sophisticated Deep Neural Network (DNN) solutions.
Recently, the top DNN models are trained and evaluated based on Mean Relative Absolute Error (MRAE)-a relative error which measures the spectral difference as a percentage of the ground-truth spectral value. Comparatively, all regressions are still trained based on the Least Squares (LS) minimization, which does not suggest a minimized MRAE result. This problem is further compounded by the sub-optimal regularization setting used in conventional regressions where all spectral channels are jointly regularized by a single penalty parameter.
In this paper, we developed new regression approaches that minimize relative errors and are regularized per spectral channel, including the closed-form Relative-Error Least Squares (RELS) and the Relative-Error Least Absolute Deviation (RELAD) approach (which directly minimizes MRAE and was solved by an iterative method).
Our results showed that the new minimization approaches significantly improve the conventional regressions especially in the darker regions of the images. Consequently, our best improved regression model narrows the performance gap with the leading DNNs to only 8% under the MRAE evaluation.