Physically Plausible Spectral Reconstruction †

Spectral reconstruction algorithms recover spectra from RGB sensor responses. Recent methods—with the very best algorithms using deep learning—can already solve this problem with good spectral accuracy. However, the recovered spectra are physically incorrect in that they do not induce the RGBs from which they are recovered. Moreover, if the exposure of the RGB image changes then the recovery performance often degrades significantly—i.e., most contemporary methods only work for a fixed exposure. In this paper, we develop a physically accurate recovery method: the spectra we recover provably induce the same RGBs. Key to our approach is the idea that the set of spectra that integrate to the same RGB can be expressed as the sum of a unique fundamental metamer (spanned by the camera’s spectral sensitivities and linearly related to the RGB) and a linear combination of a vector space of metameric blacks (orthogonal to the spectral sensitivities). Physically plausible spectral recovery resorts to finding a spectrum that adheres to the fundamental metamer plus metameric black decomposition. To further ensure spectral recovery that is robust to changes in exposure, we incorporate exposure changes in the training stage of the developed method. In experiments we evaluate how well the methods recover spectra and predict the actual RGBs and RGBs under different viewing conditions (changing illuminations and/or cameras). The results show that our method generally improves the state-of-the-art spectral recovery (with more stabilized performance when exposure varies) and provides zero colorimetric error. Moreover, our method significantly improves the color fidelity under different viewing conditions, with up to a 60% reduction in some cases.


I. INTRODUCTION
Hyperspectral imaging devices are developed to capture high resolution radiance spectra at every pixel in an image, namely the hyperspectral images.These images often record additional scene information that are 'invisible' to human eyes and consumer RGB cameras (where the spectral information is recorded with only 3 intensity values per pixel), which has been found useful in numerous computer vision applications including remote sensing [40], [11], [19], [38], [10], anomaly detection [23] and medical imaging [44], [45], as well as computer graphics applications such as scene relighting [25] and digital art archiving [43].
A key concern of this paper is the physical plausibility of the spectral reconstruction algorithms.If we physically measure the radiance spectra by an accurate hyperspectral camera, given the 3 spectral sensitivity functions of an RGB camera, we are guaranteed to produce the RGBs which the RGB camera will actually give.Unfortunately most -and all deep neural network based -spectral reconstruction algorithms do not ensure this property.Indeed, as we shall show in this paper, the predicted RGB can be quite far from the actual one.This is not just 'unfortunate' but completely missing out one of the key reasons we would like to use the spectral measurements: to predict what we see, for example to better predict our own color sensation when the viewing conditions change (i.e. the color correction problem).
In Figure 1 we illustrate the idea of 'physical plausibility'.The physically plausible spectral reconstruction is illustrated in the top diagram.Here a radiance spectra (i.e. a spectral power distribution r(λ), see the right side of the image) is recovered from an RGB (the red point on the left in RGB space).Now we reintegrate the spectra with the camera's spectral sensitivities Fig. 2: The color fidelity test on the artist Matisse's famous painting 'Jazz' for two reconstructed spectra with similar spectral error.
-simulate taking a picture of this spectrum -which gives a predicted RGB.In this case the input RGB and the predicted counterpart are the same.
The diagram in the bottom half of Figure 1 shows a spectral recovery which produces incorrect color when reintegrated with the camera sensitivities.This 'physically non-plausible spectral reconstruction' is the norm (and is a feature exhibited by all deep network based algorithms we are aware of).Put bluntly, these algorithms provide the estimations of spectra which -because they do not reintegrate to the input RGBmust be the wrong answers.
In Figure 2 we show a pictorial example.In the bottommiddle panel we show two reconstructed spectra -red and purple dotted curves -having similar spectral difference from the ground-truth (blue solid curve).By integrating the sRGB display color matching functions given in the middle-top panel, the purple curve reproduces the background color exactly as the original painting on the left.In contrary, the red curve reproduces the image on the right, which shows significant background color shift.
The examples illustrated in Figures 1 and 2 show the advantage of incorporating the physics of image formation into learning-based methods for spectral reconstruction.Another related issue we also consider in this paper is exposure invariance.Clearly, if the light intensity in the scene changes, the ground-truth radiance spectra will be linearly scaled, and for linear RGB images (i.e. the camera raw data), the RGBs will also be scaled in the same way.However, as we later show in this paper this second physical reality is also not preserved in the state-of-the-art CNN models: these networks were trained for a single exposure condition and they perform poorly when a different exposure setting is tested.
This paper makes three main contributions: • We evaluate the state-of-the-art HSCNN-D and HSCNN-R models [5], [35] to gauge the extent that they deliver physically plausible spectral recovery, either in the sense of predicting the input RGBs or being resilient to a varying exposure.
• We propose a novel framework which ensures exact color reproduction in CNN-based spectral reconstruction.
• We design a data augmentation process that maintains model stability over different exposure settings.The rest of the paper is organized as follows.In section 2 we review the related field.In section 3 we show how we can solve the spectral reconstruction problem while ensuring physical plausibility.Implementation details are given in section 4. Experimental results are presented in section 5.The paper concludes in section 6.

II. RELATED WORK
Hyperspectral imaging.There exist technologies where hyperspectral images can be directly captured, and these include using a prism-mask system [8], multiple cameras [41], [31] and faced reflectors [37].However, the practical applciation of these devices is limited by their complex configurations and/or their physical bulkiness.Alternately, in compressive imaging, a scene's spectral information is encoded in alternative forms on the sensed 2-D images.But, there is the overhead of decompressing the signal.Examples include multi-spectral color filter array [12], coded aperture [6], [16], [17], diffractive gratings [27], digital micro-mirror device [33] and most recently random printed mask [47].Other problems inherent in compressive sensing are the need for specialized optics and the inherent trade-off between the number of sensors and/or light sensitivity and the spatial resolution.
Spectral reconstruction (SR).Rather than building new hardware for capturing hyperspectral images, spectral reconstruction attempts to map RGB images to their spectral counterparts.Shallow-learned methods -of which sparse coding is the best example [4], [1] -have the advantage of model simplicity and quick training.However, these models is effectively implementing a 'one-to-one' lookup table, which contradicts the fact that many (in fact infinite) spectra can reproduce the same RGB.
In the CNN approach the implementation complexity is much higher as so the hardware requirements but the reconstruction is richer.The promise of these methods is that, in an intermediate representation, they might identify scene contents which are associated with the target spectra and then effectively use these information in the recovery process.Indeed, it is well known that faces, chlorophyl (in foliage) and daylights have very characteristic shapes (amongst other scene features).Of the current developments, deep neural networks [3], [24], [36], [15], [35] provide the leading performance in spectral reconstruction.
Physical plausibility.In this paper we address the importance that, as an alternative way of getting hyperspectral information, a spectral reconstruction algorithm should always produce physically plausible radiance predictions which can be reintegrated to the same RGB values as they are recovered from.
Interestingly, some of the early models can already provide accurate color reproduction (but much poorer spectral recovery comparing to the recent CNN methods).For example, using weighted-PCA based on color differences [2] and colorimetrically correcting the linear regression spectral recovery (i.e.pseudo-inverse) [46].Furthermore, sparse coding methods [4], [1] can also provide rather accurate color reproduction by virtue of their fundamental 'neighbor embedding' assumption [39].Finally, the complex and computationally laborious Bayesian inference method [29] was also introduced where physical plausibility is ensured.
However, in the recent NTIRE 2018 Challenge on Spectral Reconstruction from RGB Images (hereinafter abbreviated as NTIRE2018) [5], all 12 leading entries out of 73 attendants (on the 'Clean Track') involve the implementation of deep neural networks.None of these methods explicitly ensure the spectra can reintegrate to the input RGBs.
Exposure invariance.In many learning-based computer vision tasks, the model stability over intensity change are considered; that is, the model are ensured to work well even as the scene exposure changes.However, Lin and Finlayson [28] demonstrated that leading spectral reconstruction models in NTIRE2018 perform poorly in different exposure settings, and this has raised a concern that many modern developments of spectral reconstruction may not work in the wild where exposure can vary.

III. PHYSICALLY PLAUSIBLE SPECTRAL RECONSTRUCTION
At each pixel of a hyperspectral image, a high-resolution radiance spectrum is recorded.The corresponding RGB image is simulated by calculating the inner products between the measured radiance spectra and the spectral sensitivity functions of the RGB camera: where k = 1, 2, 3 refer to the red, green and blue channels of the RGB image, ρ k , s k (λ) and r(λ) are respectively the kth camera response, the k-th camera sensitivity function and the radiance function, λ denotes the wavelength dimension, and Ω is the visible spectrum.Of course for this inner-product model (as oppose to an integral) of image formation to work, we must sample the spectra at a sufficient resolution across the visible spectrum.In all simulations we report later in this paper, we assume the visible spectrum runs from 400 through 700 nanometers, and the spectra are sampled every 10 nanometers (this is the common assumption made in most studies, including the NTIRE2018 [5]).
Let us vectorize the above equation: where ρ = (ρ 1 , ρ 2 , ρ 3 ) T is the 3-dimensional RGB vector, r is the n-dimensional radiance spectra with n to be the number of spectral bands, and S = (s 1 , s 2 , s 3 ) is an n × 3 matrix with its columns to be the three distinct camera sensitivity functions.
In the ordinary spectral reconstruction framework, the radiance spectrum r is recovered from the RGB camera response ρ: the spectral reconstruction algorithm searches for the best solution to r within the entire spectral space (i.e.R n ) that statistically minimizes the distance error between the recovered and ground-truth radiance spectra.However, this framework does not ensure that the reconstructed r must reproduce ρthe algorithm may find a solution which is spectrally close to the ground-truth but reproduces distant color (as per the example we showed in Figure 2).
Let us now develop a method to constrain the algorithm only to search for the estimated radiance within the set of spectra that integrates to the correct RGB.For this purpose, we propose a plausible set concept, which is defined as the set of all spectra that integrate to a target RGB.
The derivation of our plausible set is analogous to, but simpler than, the metamer set in [14], [29], while their focus was on the reflectance set instead of our case on the radiance set.

A. The Plausible Set
Given known camera sensitivity functions S, the plausible set P is defined as: Geometrically, the outcome of an inner product is only affected by the parts of the two vectors that are 'parallel' to each other, whereas the 'perpendicular' part do not contribute to the product.Given this view, the constraint S T r = ρ in effect separates r into two parts: the part that is spanned by the column vectors of S which contributes to ρ, and the part lies in the null-space of S which yields zero projection.That is, subject to and The r component can be derived directly by the subspace projection.The projection matrix with respect to S is written as: such that Next, we enforce our desired constraint S T r = ρ on r , which gives It is important to see that the derived r is fixed given S and ρ, which implies that all r ∈ P(ρ; S) shares the same r and only the r ⊥ component determines the difference between the radiance in set P.
On the other hand, r ⊥ can be any vector in the null space of S, which is spanned by n − 3 linearly independent bases that are orthogonal to all 3 column vectors of S. This set of bases can be obtained by finding the non-trivial solutions of r ⊥ in Equation (4b), or by calculating the n − 3 basis vectors that span I n×n − P S (I n×n is the n × n identity matrix), which is the projection matrix with respect to all other dimensions in R n that are orthogonal to the column space of S. In either way, we get a set of n − 3 null-space bases, and every r ⊥ can be uniquely derived by a linear combination of them: where N is an n × n − 3 matrix with its columns to be the null-space basis vectors.We call α the null-space coefficients.
Finally, we reach the following definition of P: where all S, N and ρ are known factors, leaving α to be the only variation within the plausible set P.
In the next part of this section, we are going to introduce our physically plausible spectral reconstruction via the null-space coefficients reconstruction.

B. Reconstructing the Null-space Coefficients
Recall Equation (1a) and (1b), given a ground-truth radiance r gt , the corresponding RGB is calculated by ρ gt = S T r gt .This indicates that r gt is a member of P(ρ gt ; S), in which it corresponds to one unique α, denoted as α gt .Based on which, we further translate our goal of making the spectral reconstruction algorithm search for the reconstruction r rec in P(ρ gt ; S), into seeking the null-space coefficients α rec in R n−3 which best approximates α gt .Training the spectral reconstruction algorithm SR : R 3 → R n−3 such that the reconstructed spectrum r rec ∈ P(ρ gt ; S) is then derived by So far, the idea behind our physically plausible framework for spectral reconstruction has been established.Still, for CNN models the ground-truth labels are necessary, which means we are yet to calculate α gt from the r gt in the hyperspectral images.
In Equation ( 5) we calculated the projection matrix P S which projects r onto the column space of S that derives r .Likewise, to derive r ⊥ we seek the projection of r onto the column space of the N.The null-space projection matrix can be written as: (which is equivalent to I n×n − P S ), such that Together with Equation ( 8), we get Finally, since the columns of N (i.e. the null-space basis vectors) are linearly-independent, we derive Figure 3 summarizes the framework of our physically plausible spectral reconstruction.In the reconstruction stage, the camera-subspace projection r is calculated directly from the RGB input and the spectral reconstruction algorithm only concerns the recovery of the null-space projection r ⊥ .As the color reproduction of the reconstructed hyperspectral image only depends on r (Equation (4a) and (4b)), the reconstructed hyperspectral image is ensured to reproduce exactly the input RGB image.In the next section we are going to integrate this framework with the state-of-the-art spectral reconstruction model based on CNN.

IV. IMPLEMENTATION
We build our models based on the HSCNN-R architecture, which is the 2nd place entry of the NTIRE2018 [5], [35] (whose performance is similar to the 1st place HSCNN-D model; we use the 2nd place architecture simply because it was simpler in our development environment).As illustrated in Figure 4, the HSCNN-R model adopts a deep residual learning framework [21].Each of the residual blocks is constructed with two convolutional layers and one ReLU layer.The On training and reconstruction, the network maps 50 × 50 RGB image patches to the corresponding 31-channel hyperspectral patches (the spectral dimension runs from 400 to 700 nm with 10 nm sampling intervals).The final image is decided by the reconstruction outcome of 3 HSCNN-R networks with different filter numbers in each layer (64, 256 and 256) and depths (34, 20, and 30).
In this paper, we aim for two improvements on HSCNN-R: (1) perfect color reproduction and (2) robustness against exposure change.For the former, we integrate our physically plausible framework to HSCNN-R, and for the latter, we propose a new data augmentation process.To study the effects of both improvements, 3 new models listed in Table I are trained.

TABLE I: List of our new models
A. Physically Plausible HSCNN-R In the original HSCNN-R model, the output layer corresponds to a 50 × 50 hyperspectral image patch with 31 spectral dimensions.To accommodate our physically plausible framework in HSCNN-R p and HSCNN-R pd , we reduce the spectral dimension from 31 to 28 in the output layer for recovering the image of null-space coefficients (α is (n − 3)dimensional with n = 31).
Unlike the original hyperspectral data which only contains positive values, the null-space coefficients α allow negative entries, and this is not permitted for the ReLU output layer in the HSCNN-R architecture.As a result, it is necessary to re-center the ground-truth α such that the negative values are prevented.
In our implementation, we found that empirically the entries of the ground-truth α range between −1 and 1.Hence, in the training stage of HSCNN-R p we adopt a re-centering: The reverse function is used in the reconstruction stage to center the targeted α back from α.As we will mention later the HSCNN-R pd model requires a different re-centering function due to the implementation of our data augmentation process.
The rest of the hyperparameters of HSCNN-R p are kept the same as the original HSCNN-R model [35].Our HSCNN-R p is expected to provide absolute color reproduction.However, as shown in [28], the original HSCNN-R is not robust against intensity change, and this implies that HSCNN-R p also may not perform well in spectral recovery when the testing exposure condition varies.

B. Intensity-scaling Data Augmentation
We create the augmented data by simulating 'brighter' and 'dimmer' RGB images from the ground-truth hyperspectral images.Instead of generating all the new data before training the model, we draw different scaling constants in real time during training: all input image patches (and the same patch in different training epochs) are scaled differently, which allows the network to see more intensity variation in the data.
Furthermore, since a spectral reconstruction algorithm can be potentially implemented on an RGB camera, we want to especially ensure that, when adjusting the standard exposure settings in the RGB cameras (i.e. the aperture size and shutter speed) the trained model performs equally well.We remind that these settings by convention follow geometric progressions; more precisely, the available aperture sizes normally follows a sequential scaling change by √ 2, and the shutter speed is adjusted by a factor of 2 between adjacent modes.Based on this fact, we propose to draw the scaling constants ξ from a uniform distribution on a log scale: In our implementation, we set β = 10 such that the scaling factor ξ is bounded by [ 1 10 , 10].For comparison, we train another intermediate model, HSCNN-R d , which only adopts the intensity-scaling data augmentation.This model is refined from the pre-trained HSCNN-R provided in [35], and all hyperparameters are kept the same.
On the other hand, to apply this new data augmentation framework on the physically plausible model, we need to  adjust the re-centering function to accommodate the change in range of the entries of α.In our case, as β = 10: Additionally, to make the model converge efficiently, we set the adaptive learning rate to follow a polynomial decay with the power of 25 (instead of the original 1.5).This final model is referred to as HSCNN-R pd .
V. EXPERIMENT

A. Experimental Setup
We trained our new models (as listed in Table I) based on the ICVL database [4] (201 hyperspectral images), where we randomly split the database into 100 images for training, 50 for validation and 50 for evaluation.The CIE 1964 color matching functions [13] were selected as the camera sensitivity functions, by which the ground-truth RGBs (i.e. the CIEXYZ color coordinates) were simulated.We also tested the original HSCNN-D and HSCNN-R models (the pre-trained networks in [35] were directly used) to compare with our new models.
Our experiment concerns the performances of the models in terms of (1) color reproduction, (2) spectral recovery and (3) both performances under different exposure settings.We select the following error metrics: • Spectral difference: Mean Relative Absolute Error Equation (19) shows the definition of the CIE 1976 color difference formula [32], where (L * gt , a * gt , b * gt ) and (L * rec , a * rec , b * rec ) are the CIELAB color coordinates of the ground-truth and reconstructed RGB colors, respectively.The transformation between CIEXYZ and CIELAB requires the normalization by the 'white point' coordinates (i.e. the illumination color), for which we hand-craft the white points of each images by selecting the RGB of the brightest achromatic pixel.
In Equation (20), respectively r gt and r rec refers to the ground-truth and reconstructed radiance spectra, and n is the number of spectral bands.The division is component-wise and the L 1 norm is calculated.
Note that both of the above metrics are pixel-wisely defined, which means the performance of each pixel in an image is considered independently.In addition, since both metrics involve normalization of the reference intensity: for ∆E the illumination white-point coordinates are divided, and for MRAE the spectral difference is divided by the ground-truth spectrum.This ensures that our performance measurements are independent to the overall intensity of the compared targets.
We test all models under 3 exposure settings: the original, half and double exposure.For each testing exposure, we uniformly scale-up all the evaluation images with the same scaling constant (respectively ξ = 1, 0.5 and 2), and the reconstructed hyperspectral images are compared with the groundtruth hyperspectral images scaled by the same constant.

B. Result and Discussion
The performance statistics are shown in Table II.We show the mean and the worst-case (WC) performance of the models.The 'worst case' is defined per image as the averaged error of 'the worst 1000 pixels' (the image dimension is around 1300 × 1392), and the worst-case performance given in Table II refers to the mean worst-case error over all evaluation set.
First, we see that the state-of-the-art HSCNN-D and HSCNN-R are not physically plausible.Indeed, the worst-case ∆E of these models are significant (referring to [34] human observers can sense noticeable difference above ∆E ≈ 2.3).Our physically plausible HSCNN-R p not only provides zero error in color reproduction, but also significantly improves the worst-case performance in terms of spectral recovery over the original HSCNN-R.However, as shown in Figure 6, HSCNN-R p and the original models provide poor spectral recovery performance when half and double exposure settings are applied.
Next, we can see very clearly in Figure 6 and 7 that standalone HSCNN-R d (without the physically plausible training) shows great advantage over original HSCNN-D and HSCNN-R in both the spectral recovery and color reproduction performance when exposure condition changes.However, the worst-case color reproduction performance of HSCNN-R d is still sub-optimal.
Lastly, we want to 'jointly' consider the performance in spectral recovery and color reproduction.Frankly speaking, it is not possible to strictly say which performance is more important than the other.We also remark that depending on different applications, this relative importance can vary drastically.Despite of this, we can combine the two metrics with an adjustable relative weight, to see the model performance in all different cases of relative importance.Define a joint metric η: the joint metric η in effect describes the 'competition' in importance between the two concerned metrics.We show two comparisons.First we show the mean MRAE against the mean ∆E in the top panel of Figure 5, and the worstcase MRAE competed with the worst-case ∆E in the bottom panel.Note that for each model we average the performances under the 3 testing exposure conditions, and we normalize all performances by the average performance across models (this is for making MRAE and ∆E in the same order of magnitude).We show that in both cases, either mean or worstcase performances, our proposed HSCNN-R pd performs the best overall.
VI. CONCLUSION Spectral reconstruction (SR) studies the mapping from RGB to hyperspectral images, which is regarded as a promising solution to low-cost, snapshot and high resolution hyperspectral camera.In the recent development of spectral reconstruction, leading models are based on Convolutional Neural Networks (CNN), providing remarkable spectral recovery performance.However, these models only aim to minimize the spectral recovery errors without ensuring the physical plausibility of the output spectra.Physical plausibility is defined as ensuring the recovered spectrum integrates (using the underlying camera sensors) to the same RGB as it is recovered from.Existing method, which do not have this property, estimate RGBs which are significantly different from those found in the original image.
In this paper we developed a physically plausible Spectral reconstruction framework.Our insight is that all plausible spectra can be represented by a fixed camera-subspace projection spectrum defined by a linear combination of camera spectral sensitivities, and a null-space spectrum which do not contribute to the color formation.Relative to this insight, the spectral recovery problem sets out to reconstruct the nullspace spectra from the RGB (instead of the original RGB to radiance mapping), such that the physical plausibility of the predicted radiance is guaranteed.Finally, we also addressed the issue of exposure invariance in spectral reconstruction [28], by proposing a new data augmentation framework to ensure the model robustness against intensity variations.As the exposure changes, our models provide leading performance considering both spectral recovery and color reproduction.from the ICVL dataset [1] for training and 50 for validation (the spatial dimension of these images is around 1300×1392).The only difference is that now the corresponding raw RGB images were simulated by the spectral sensitivities of SONY IMX135 (one of the three cameras used in INTEL-TAU).That is, the two SR models were trained to map SONY IMX135 raw RGBs to hyperspectral image output.

II. RECONSTRUCTION
The two trained SR models were used to reconstruct the hyperspectral information from 6 selected raw RGB images from the INTEL-TAU dataset [5], all of which were taken by SONY IMX135.The spatial dimension of these images is 2448 × 3264.
Then, the reconstructed hyperspectral images were again reintegrated into raw RGB images with the spectral sensitivities of SONY IMX135.At this stage, the proposed HSCNN-R pd is expected to give the exact same RGBs as the input raw RGB images, whereas HSCNN-R can generate different RGBs.The goal of this supplementary test is to visually demonstrate how different (how wrong) this recovery can be from a colorimetric point of view.

III. COLOR FIDELITY TEST
From page 3 onward of this supplementary document, we are going to show several visual comparisons and quantitative error maps between the ground-truth and the SR predicted end-of-pipe RGB images.In this section we detail the image rendering process and the process of calculating the quantitative errors.In the processing pipeline of a camera, the raw image might undergo, but not limited to, the following processes before being shown to the end users: black level and saturation correction, white balancing, color correction and gamma correction.As we are already given the expected end-of-pipe image with each raw image in the INTEL-TAU database, we can alternatively build a 3D Look-up-table (LUT) which approximates the actual image processing pipeline: for each image, the LUT is built to relate each color in the groundtruth raw RGB image to the colors in the supplied (expected) end-of-pipe image.
This LUT can be optimized -in a least-squares senseby lattice regression [6], [4].To speed up the optimization process, we train the LUT on thumbnail images, where we simply downsample the images from the original 2448 × 3264 to 108 × 144, and bin the colors by 24 × 24 × 24 in the three color channels.Then, the full resolution ground-truth raw RGB and the raw RGB reintegrated from the reconstructed hyperspectral image are mapped to their respective end-of-pipe renditions by applying the same 3D LUT.
In Figure 3-8, an example image is shown in the bottom-left of each figure, in which the 4 regions of interest are marked with white squares.The 'Ground Truth' image (top-left of each figure) is actually the end-of-pipe image rendered by the trained 3D LUT mapping.On the other hand, from the ground-truth raw RGB we carry out spectral reconstruction (i.e. the two trained SR models) and reintegrate the recovered hyperspectral images with the camera sensitivities to get an approximate raw image.By applying the same LUT to this derived raw image we generate the end-of-pipe images predicted by the two SR models, as shown in the top-middle and top-right images in each figure .We can already see that HSCNN-R, as an physically nonplausible spectral reconstruction model, introduces color shifts that are quite visible after color rendering, while our physically plausible HSCNN-R pd successfully preserves the original colors in the ground-truth images.To further quantify the color shifts, we are bound to calculate the color difference between the ground-truth and the reintegrated RGB images.

B. Quantifying color differences
We wish to use the CIE 1976 color difference (∆E) [7] to quantify the colorimetric errors.Since the ∆E is defined in CIELAB color coordinates (as shown in Equation (19) in the main paper), we must consider how we transform the camera raw RGB to their CIELAB counterparts.
The procedure is summarized in Figure 2. Unlike in the main paper where the CIELAB coordinates can be transformed directly from the CIEXYZ colors (with the white point color this mapping is one-to-one [10]), the mapping from the real camera's raw RGB to CIELAB is unknown if the raw data is the only given information.Fortunately, INTEL-TAU also provides with each raw image the color correction matrix (CCM) that transforms the image into sRGB colors and the information of ground-truth white point (WP) that ensures oneto-one mapping between sRGB and CIELAB [10].Finally, the desired ∆E color difference between ground-truth and reintegrated RGB images can be calculated from the transformed CIELAB images.
We show the ∆E error maps in the bottom-middle and bottom-right of Figure 3-8, which detail the pixel-wise colorimetric errors introduced by the two trained SR models in the 4 selected regions of interest.It is evident that HSCNN-R recovers spectra that reintegrate into wrong colors with significant errors (we remark once again that referring to [8] human observers can sense noticeable color difference above ∆E ≈ 2.3).Remarkably, our proposed HSCNN-R pd model -which possesses both physical plausibility and exposure invariance -preserves complete color fidelity.

Fig. 3 :
Fig. 3: The training (left) and reconstruction scheme (right) of our physically plausible spectral reconstruction.

Fig. 5 :
Fig. 5: Joint metric η versus the relative weights γ between MRAE and ∆E.Respectively the top panel considers the mean and the bottom panel considers the worst-case MRAE and ∆E errors.The solid colored areas under the lowest curve indicate the best model at each γ.

Fig. 6 :
Fig. 6: Visualization of spectral recovery errors by MRAE heat maps.All models are tested under original exposure (top row), half exposure (middle row) and double exposure (bottom row).

Fig. 7 :
Fig. 7: Visualization of color reproduction errors by ∆E heat maps.Referring to [34] the threshold for human observers to notice the difference is around ∆E ≈ 2.3.All models are tested under original exposure (top row), half exposure (middle row) and double exposure (bottom row).

Fig. 2 :
Fig. 2: The process of calculating CIE 1976 color difference ∆E between ground-truth and reintegrated color images.

TABLE II :
The mean and the worst-case (WC) hyperspectral image reconstruction error in ∆E and MRAE under original, half and double exposure settings.Best results are shown in red and the second-best results are shown in blue.