## 1. Introduction

Spectral filter array (SFA) cameras are a new single-shot spectral imaging technology [

1], which is gaining popularity in different fields of research [

2]. The light entering the camera is filtered with narrow spectral bandpass filters on each pixel or subpixel. Spatial decomposition of the spectral signal allows capturing of all spectral bands at the same instance.

Prototypes have been proposed in academia [

3] and commercial models are now available including the XIMEA xiSpec camera [

4,

5] and Silios technologies SFA camera [

6]. With increased adoption and commercial availability of SFA cameras, it is important to analyze parameters contributing to image quality parameters of these cameras and provide tools to guide further development for specific applications.

Image quality performance of cameras for close range imaging is a broad field of research [

7,

8,

9] covering many different aspects including: spatial resolution [

10,

11,

12], spectral or color accuracy [

3,

13,

14], reproducibility, noise behavior [

15], optical distortions and post-processing steps. The required accuracy of spectral reconstructions, number of channels and wavelength of interest are application dependent and should be evaluated in the context of specific applications. If SFAs combine accurate spectral reconstruction with real-time acquisition speed and ease of use, they could potentially be a powerful new imaging modality for the medical field. Digital imaging is already widely adopted for skin imaging, which could benefit from additional spectral information [

16,

17,

18,

19,

20]. Small color variations in the skin can carry relevant information for physicians. There is a need for more reliable and quantitative methods to measure physiologic parameters of patients in non-contact. SFA cameras could combine non-contact monitoring of vital functions and diagnosis of diseased skin tissue in real time [

21,

22,

23,

24,

25,

26,

27,

28]. In particular, dynamic processes such as oxygenation would highly benefit from spectral and spatially resolved images in real time [

27,

29,

30,

31,

32].

Previous work by Preece and Claridge [

33] has investigated optimal filter sensitivities for a three-channel system for skin diagnosis. An extensive hardware focused analysis of spectral imagers for biomedical applications is provided by Gutiérrez-Gutiérez et al. [

34]. The main focus of their work was the technical limitations including acquisition speed, efficiency, object plane curvature, spatial resolution, distortions, and noise. They emphasized an imaging system for biomedical applications should be selected after thorough testing of these parameters. A comprehensive emulation framework has been proposed by Saager et al. [

35] giving an overview of the performance of different spectral imagers including a Xispec SFA camera and an RGB sensor a burn wound mouse model and photoaging experiment. High-resolution spectral measurements were performed using a spatial frequency domain spectroscopy (SFDS) system. In the computer graphic domain with Jimenez et al. [

36] and Iglesias-Guitian et al. [

37] described physically based skin appearance models to show color changes due to emotions or ageing. The same models can be used as to generate skin reflectance training sets.

The aim of this study is the development and testing of a framework for comparison of SFA cameras for spectral reconstruction, skin imaging, and oxygenation level estimation without prior patient measurements. A generated specialized training set is quantified for spectral reconstruction.

This framework could be considered prior to the hardware focused selection by Gutiérrez-Gutiérez et al. [

34] and provides a simplified measurement free alternative to the method proposed by Saager et al. [

35]. The framework could also be applied as a guide for the development of application-specific SFA cameras.

Three aims of study can be formulated as:

comparison framework of spectral filter array cameras for skin imaging and medical diagnosis

illustrate the impact of spectral reflectance reconstruction using a specialized training set for SFA camera applications in skin imaging.

recommendation of commercially available SFA cameras for monitoring of vital functions and diagnosis.

## 2. The Proposed Framework

The proposed framework has three main elements: (1) calculation of a spectral reconstruction matrix, (2) simulated sensor responses and (3) an evaluation block. It is shown in

Figure 1 and follows the concepts of a spectral filter array processing pipeline proposed by Lapray et al. [

38].

As a first part, a spectral reconstruction is performed to estimate the full spectra using the limited number of SFA bands providing a measure of the performance of the different cameras independent of applications. In addition, the estimated spectra are then analyzed regarding their accuracy for oxygenation level estimation being an example for a specific application. Three SFA cameras, one prototypical, two commercially available and an RGB camera are evaluated. The impact of gaussian spectral bands (GSB) is tested by simulating sensor sensitivities with gaussian shapes for each of the SFA cameras channels.

A set of (10,000) [

39] skin reflectances is generated using a Monte Carlo skin model and compared to a Munsell reflectance patch database [

40,

41] for training the spectral reconstruction. A database of spectral measurements of skin reflectances (100 measurements) [

42] is used for testing the spectral reflectance reconstruction. The spectral reconstruction accuracy is compared numerically using Root Mean Square Error (RMSE) and

$\Delta {E}_{00}$ color differences [

43]. Differences in estimated oxygenation levels are numerically compared using a proposed metric. Spatial aspects are not considered in this study since the standard clinical measurement of oxygenation levels are usually averaged over a small area and the skin simulation is only considering homogeneous tissue over the simulated surface.

## 3. Prerequisites

For full spectral reconstruction simulated sensor responses are needed. The spectral reconstruction accuracy needs to be evaluated regarding spectral accuracy and in relation to specific applications. The framework could be applied to any channel-based spectral imager with known spectral sensitivities. For comparing specific spectral imagers, sensor sensitivities, training and test data and evaluation metrics must be chosen.

#### 3.1. Spectral Imaging Model and Spectral Reconstruction

Spectral reconstruction is a useful estimation technique to estimate full spectra from a limited number of bands. The wavelengths of interest might also be unknown prior to the practical applications. It allows comparison of spectral cameras with different sensitivity peaks in a common space.

The spectral reconstruction is based on the inversion of a commonly known imaging model, which can be described with the equation:

where

${P}_{i}$ is the channel response of the

${i}^{th}$ channel of the sensor.

$E\left(\lambda \right)$ is the illumination spectral power distribution (SPD) per wavelength,

${R}_{j}\left(\lambda \right)$ is the spectral reflectance of sample

j and

${Q}_{i}\left(\lambda \right)$ describes the spectral sensitivity of the

${i}^{}$th channel of the sensor. Noise can be described as an additive constant to each channel.

Two simplification have been applied to the imaging model for this study. Noise per channel has not been considered and illumination has been assumed to be of equi-energy. Both variables influence the performance of the cameras in a real setup. Specific light-source power distributions might favor a particular camera hindering the comparability. A mathematical description of noise might not be an adequate descriptions of practical noise behavior of a physical camera. A chosen noise model could also favor one camera for the comparison.

This model can be inverted for spectral reconstruction, by estimating

${R}_{j}\left(\lambda \right)$. Several different techniques have been proposed including the pseudo-inverse method [

44] (linear least-square fitting) or linear least-square fitting in lower-dimensional space (Imai–Berns method) [

45]. For this study, a commonly used spectral reconstruction technique known as Wiener estimation [

45,

46,

47,

48,

49,

50] is applied. Before inverting Equation (

1) it is rewritten into discrete formulation:

N is the number of spectral bands depending on the wavelength range and spectral resolution, in this case,

$\lambda \in [400,700]$ with a sampling rate of 2 nm steps and

$N=151$. For all

j reflectances of the training set, the channels

i of the sensor and

k distinct spectral bands, we can write in matrix form:

$\mathbf{p}$ is of

$J\times I$ dimensionality with

J spectral samples and

I channels,

$\mathbf{R}$ of

$J\times N$,

$\mathbf{E}$ of

$N\times N$ (diagonal matrix) and

$\mathbf{Q}$ of

$N\times I$ where

N is 151 different wavelengths for this research. This is inverted according to the Wiener estimation method [

45,

46,

47], in this study the implementation by Nishidate et al. [

49] is followed and describes a reconstructed reflectance with:

where

$\mathbf{W}$ describes the Wiener estimation matrix,

$\tilde{\mathbf{r}}$ the resulting vector of reflectance estimation or reconstruction and

$\mathbf{p}$ the vector of sensor responses for each channel. The Wiener matrix is calculated by minimizing the square error of reconstructed and given reflectance for a training set of reflectances.

This matrix needs to be calculated for each camera and training set combination. Sensor responses can be simulated by multiplying the sensor sensitivities and the reflectance spectrum of an object. Spectral reconstructions can then be performed given this sensor response and the pre-trained Wiener estimation matrix $\mathbf{W}$.

#### 3.2. Sensors

Most SFA sensors are based on micro interference filters (often Fabry–Pérot interference) that can be simulated with GSB as shown by Lapray et al. [

51] with width and shape as main parameters [

52,

53].

The framework enables the comparison of any multi-band sensors with known spectral sensitivities or optimize the design of ’virtual’ SFA cameras for specific applications. SFA cameras have a limited number of wavelength bands divided over the sensor. The design of SFA sensors will be a trade-off between spectral resolution and spectral range covered. A narrower spectral band per filter will improve the spectral resolution, but would require more spectral bands to cover the whole sensitivity. Broader sensitivities on the other hand, reduce the spectral resolution, but require less filters and avoid (“holes”) in the covered spectrum. However, for specific applications only a few primary wavelengths are needed as in case of oxygenation estimation.

In this study, we included simulated GSB they were chosen with a full width half max that make them comparable with them real sensor sensitivities of the cameras tested.

#### 3.3. Training and Test Set

The training set will contribute to the accuracy of spectral reconstruction using Wiener estimation which calculates a transformation matrix that translates SFA responses to a full spectrum. This transformation matrix should minimize the difference between the reference spectrum and a reconstructed spectrum. The reference spectrum used to determine this matrix is called the training set.

For training two sets were compared to see the impact on the reconstruction accuracy for the different cameras: The Munsell database is used as a standard for color testing and the second training set was a generated for skin color simulation using a wide array of skin optical properties. The skin simulation (training set) assumes an equi-energy illumination and therefore represents illumination corrected skin spectra. Both sets are normalized using a feature scaling so that all values cover a range from 0 to 1. A more detailed description of this skin database follows in the experimental setup. For the validation if the spectral reconstruction another set based on skin reflectances was used. These skin reflectances (test set) are measured using a spectrophotometer and illumination corrected as described in [

42].

The three sets are illustrated in

Figure 2. This Figure allows comparison of the area covered by all sets and highlights three reflectances for each dataset. It includes the database of 100 measured skin reflectances [

42], 10000 Monte Carlo simulated reflectances and the Munsell reflectances color patches [

40,

41].

#### 3.4. Evaluation Metrics

The validation of the proposed framework can be tested by applying it to a specific clinical application, oxygen level estimation. This should show which spectral filter array camera is most suitable for this specific application. Three different evaluation metrics are considered. Two of the metrics focus on spectral reconstruction quality regarding shape and color. The third metric is application-specific and in this case quantifies the ability of each camera to estimate oxygen levels, it will be discussed in detail in the next section.

The first metric calculates the color difference

$\Delta {E}_{00}$ [

43] of two spectra which is the distance between two colors in the human perceptual colorspace. Each spectrum is converted into color coordinates using the, D65 illumination for the calculations, and CIE 1931 2 Degree Standard Observer color-matching functions. A

$\Delta {E}_{00}$ of around 2 is a just noticeable color difference for a human observer.

The second spectral reconstruction metric is the root mean square error (RMSE) between the reference spectrum and a reconstructed spectrum. There is no need to include the goodness of fit coefficient (GFC) or the angular error, since previous studies [

54] have shown that these correlate strongly with the RMSE.

#### 3.5. Application-Specific Metric and Oxygenation Level Estimation

The third metric is a validation of the oxygenation level estimations. This parameter can be approximated through calculations using the reflectance spectrum of skin. The reflectance spectrum of skin is the result of concentrations of particular chromophores present in the skin. The ratio between oxygenated and deoxygenated hemoglobin reflects the relative oxygenation level in the skin and is an important parameter for diagnostics. Hemoglobin occurs in different forms but only these two are relevant for oxygenation. Different methods have been proposed to estimate oxygenation levels from particular wavelengths [

27,

29,

49,

55].

For this study, the estimation uses a multiple regression method described by Nishidate et al. [

49]. A fast way of estimating absorbance

$A\left(\lambda \right)$ from reflectance assumes the Lambert-Beer law:

According to the simplified Lambert-Beer law the total absorbance of skin tissue can be described with:

where

${\epsilon}_{m},{\epsilon}_{b},{\epsilon}_{ob}$,

${\epsilon}_{db}$ describe the molar extinction coefficients of melanin, bilirubin, oxygenated and deoxygenated hemoglobin and

${C}_{m},{C}_{b},{C}_{ob}$,

${C}_{db}$ describe the concentration of each specific chromophore.

${l}_{e}$ describes the mean optical path length for epidermis,

${l}_{d}$ for dermis and

$D\left(\lambda \right)$ describes the attenuation due to scattering these values are taken from literature. This equation can be solved by multiple regression analysis and is therefore reformulated to:

where

${c}_{m},{c}_{b}i,{c}_{ob},{c}_{db}$ are closely related to the concentrations of melanin, bilirubin, oxygenated and deoxygenated blood and represent the unit-less contribution of each extinction coefficient to the total absorbance

A. Any number of wavelengths can be used to calculate the absorbances. Reflectance spectra can be converted to absorbance spectra according to Equation (

5) and then used with the following equation. The calculation of the concentration of any chromophore can then be formulated in matrix notation as:

Finally, oxygen saturation can be calculated with:

Even though a simplification of the physical light skin interactions, methods based on these principles have been used for oxygenation level estimation [

49,

56,

57,

58]. This approach allows rapid calculation of tissue parameters with low computational complexity. It is assumed that most other chromophores are constant over time. The oxygenation of blood is not constant, due to oxygen consumption by tissue. According to Equation (

9) oxygenation level estimation is calculated using both the reflectance spectra and reconstructed spectra. The Euclidean distance between the two resulting oxygenation level estimation values is then calculated and used as a quality metric to judge the reconstruction accuracy with:

## 5. Results and Discussion

#### 5.1. Training Set Validation

The first results presented in this study address the skin simulation database and can be seen as an additional verification for using this simulated training set. It is based on principle component analysis (PCA) of the sets included in this research.

The principle components allow representation of the multidimensional set in a lower-dimensional space. If the principle components are calculated for a combined set they represent the orthogonal axes of a space describing the sets. The area covered by the sets plot into this orthogonal space describes the diversity of the particular set. If multiple sets are plot into the same principle component space the difference in diversity and area covered within that PCA space can be analyzed.

The sets are shown along the first two principal components of the combined set in

Figure 5.

Table 3 shows the resulting principle components of each of the sets and the combined set. The Munsell set is the most diverse considering its low first principle component. The skin simulation set covers a wider range of reflectances compared to measured skin reflectances. This is represented in a lower first principle component. Physiological parameters cover a wider range than living tissue see

Table 2.

In

Figure 5 it can be observed that the skin simulation covers all the measured skin reflectances except for a few measurements. This can be ascribed to the limited number of parameters for the simulation, resulting in some measured skin reflectances not being represented within the skin simulation. The skin model is limited to Caucasian fair skin and initially designed for neonatal babies. To further analyze the parameter of the skin simulation, which falls far out of the measured skin reflectance, the extreme curves where plotted.

Figure 5D,E shows these extreme curves of both the skin reflectance and the skin simulation set as marked in

Figure 5A. In

Table 4 it becomes apparent that the main factor for the simulations is the blood volume parameter. All extreme results according to the PCA analysis have an extreme value for the blood volume. The melanin parameter also contributes to extreme values within the principle component space indicating the strong influence of melanin on the resulting skin spectra. In this principle component space, the bilirubin concentration parameter spreads the distribution of points.

Figure 5 also contains sRGB [

64] color swatches reproduced under a virtual D65 illumination. These provide a visual impression of the color of the extreme points in the principle component space. They show that the extreme value curves, not included within the skin simulation represent darker skin types and that extreme values of the skin simulation can include physiologically unlikely scenarios of grey skin.

#### 5.2. Spectral Reconstruction

Results for the two spectral reconstruction metrics calculated for each of the four sensors and their simulated GSB versions are shown in

Figure 6. Each of the graphs shows mean results and standard deviation of the actual sensor as a circle and the GSB sensor results as a cross. All metrics are calculated with the different training sets (Munsell and skin simulation) for the spectral reconstruction and plotted. The cartesian coordinate system consists of the number of channels on the x-axes and the value for each of the metrics on the y-axes.

These plots allow the comparison between the sensors according to the different metrics in two scenarios. It can be observed that the performance in RMSE and $\Delta {E}_{00}$ correspond to each other.

Figure 6 provides a plot of the

$\Delta {E}_{00}$ difference between the test reflectances and their reconstruction. Surprisingly, the plots show that the corrected Ximea performs the worst in the case of Munsell patches for training and according to

$\Delta {E}_{00}$. This can be ascribed to the cut of spectral sensitivity imposed by the linear correction transformation.

Figure 4 shows the low sensitivity of this sensor at the edges of the chosen spectral range (400 nm to 700 nm).

Figure 7 contains plots of the spectral reflectances ground-truth and reconstructed that are responsible for the highest

$\Delta {E}_{00}$ results for the corrected and uncorrected Ximea camera. The plot allows appreciation of the areas of the spectra that cause high

$\Delta {E}_{00}$ results. In the case of the corrected Ximea camera spectral regions that have low or zero sensitivity are wrongly reconstructed. This is not surprising but confirms the poorer performance of the corrected Ximea camera in comparison with the uncorrected Ximea camera in the

$\Delta {E}_{00}$ and RMSE metric. The more limited spectral coverage of the corrected spectral imager negatively influences the spectral reconstruction ability of this camera.

The second worst performer regarding color differences (Mean

$\Delta {E}_{00}$ = ~14 and Mean

$\Delta {E}_{00}$ = ~12) is the RGB camera. Both the low number of channels and their specific overlap in the spectral region seems to influence the estimation accuracy negatively. The lower performance of the GSB version can be ascribed to the low sigma (

$\sigma =15$) of the gaussian filters. In the case of the RGB sensor, the coverage of the spectral range of interested is as seen in

Figure 3 not optimal. The spectral distribution shows significant areas of very low spectral sensitivity and negatively influences the spectral reconstructions.

Both corrected (CorXim) and uncorrected Ximea benefit greatly from GSB improving the performance according to the $\Delta {E}_{00}$ metric. For the Silios camera, the GSB only improve the $\Delta {E}_{00}$ performance when using the expert training set as the skin simulation set. One explanation could be the sharp cut off for the GSB resulting from the bands that exceed the spectral range of this analysis. The prototypical sensor France1 has initially already close to gaussian sensitivities and does not benefit from the GSB.

The RMSE metric shows a similar trend compared to $\Delta {E}_{00}$. The Ximea camera scores better results regarding the RMSE in comparison with $\Delta {E}_{00}$. Differences between original sensors and GSB sensors are smaller considering this metric.

Training theWiener estimation matrix with the proposed specialized skin simulation set results in a more robust reconstruction according to $\Delta {E}_{00}$ and RMSE for all tested cameras. The more general Munsell set lacks skin spectral shapes and is contains two dissimilar spectra in comparison with the skin test set. The similar shapes and increased number of spectra in the generated specialized database improve the spectral reconstructions.

#### 5.3. Oxygenation Level Estimation

The oxygenation estimations were performed using six wavelengths as proposed by Nishidate et al. (500 nm, 520 nm, 540 nm, 560 nm, 580 nm, 600 nm) and three wavelengths (480 nm, 560 nm, 600 nm) the results are shown in

Figure 8.

These two oxygenation metrics show different behavior for all cameras compared to the spectral accuracy metrics. The eight and nine channel cameras (France1 and Silios) perform the worst for the Munsell training case and six wavelengths. This is surprising since these two cameras perform the best according to the spectral reconstruction metrics $\Delta {E}_{00}$ and RMSE. For this case, the performance differences between the GSB sensor and the original sensor are very small. One explanation can be that these key wavelengths all fall into valleys between the sensitivity peaks for the Silios and France1 sensor. The GSB sensors could be affected equally or stronger, due to the relatively small sigma ($\sigma =15$).

The wavelengths proposed by Nishidate et al. are optimized for an RGB sensor. For the specialized training set, the RGB camera performs the worst. Illustrating that the spectral reconstruction using a specialized training set benefits from narrow spectral channels.

Figure 8 also contains results for the oxygenation metric using three wavelengths (480 nm, 560 nm, 600 nm). It can be observed that the choice of the training set for this configuration influences the different cameras independently. For Munsell patch training, the RGB camera performs the worst and both versions of the Ximea camera the best. Using the specialized training set the differences between all cameras are smaller and the RGB camera still performs worst. The other sensors are less affected by the change of training sets only slightly lowering their oxygenation metric differences when using the specialized training set. For the idealized GSB RGB sensor lower oxygenation metric differences can be observed compared to some of the SFA sensors. This could be ascribed to the wavelength chosen for oxygenation level estimation which all fall well within high sensitivity of the gaussian RGB (GRGB) sensor.

A camera with sensitivity peaks at the wavelength of interest should perform optimally. This can be used if the wavelength of interest are known. None of the investigated cameras has optimal filter sensitivity peaks for oxygenation estimation.

Table 5 provides an overview of the statistical results for all sensors, considering the better performing skin simulation training data set.

The proposed specialized training set improved the final oxygenation parameters (estimated with three wavelengths). In the case of six wavelengths the skin training set performs worse than the Munsell set. One explanation is that using six wavelengths includes wavelengths at the outer edges of the considered spectral range. The specialized set provides too little variety for these areas and the diverse Munsell set trains these regions better.

For future work noise should be incorporated into the framework. The chosen wiener estimation method has room to incorporate a noise term into the spectral estimation and the impact of different kind of noise should be studied. The framework also allows simulation and comparison of spectral filter array cameras in different spectral ranges. Near infrared should be considered for future work as it is used in traditional oximetry systems. Furthermore, oxygenation estimation methods that use the full spectra based on inverse Monte Carlo methods should be tested in conjunction with spectral reflectance reconstruction.

#### 5.4. Summary and Conclusions

A straightforward framework to evaluate spectral filter array cameras based on spectral sensitivities and publicly available skin and reflectance databases was proposed. It allows to compare and quantify the performance of SFA cameras for medical applications and skin imaging in particular. The framework does not require prior measurements and is based on a readily available skin databases for testing, a proposed generated skin simulation database and sensor sensitivities of the cameras included.

Reconstructing full reflectances from sensor responses allows to comparison and is useful when the application-specific bands of interest are unknown. It can be useful to recreate color images and benefits from a specialized training set. If the bands of interest are known a camera with high sensitivity for those exact bands is advisable. Several observations particular to spectral filter array cameras were made:

Spectral shapes of the filters should be adapted application-specific

Careful choice of the spectral bands should be adapted application-specific

Selecting an optimal training set for spectral reflectances reconstruction improves the results for SFAs with narrow spectral sensitivities

GSB improve spectral reconstruction considering $\Delta {E}_{00}$ color differences and RMSE

GSB have a small impact on oxygenation level estimation if the bands are not close to the ideal wavelength for oxygen estimation

The framework has been applied to compare commercially available SFA cameras for skin diagnosis and skin oxygenation level estimation.

The corrected Ximea camera performed the best in terms of oxygenation level estimations. Regarding the spectral reconstruction and $\Delta {E}_{00}$ color difference metrics the Silios camera shows the best results. Recommending it for applications where the specific bands of interest are not known.

SFA cameras hold great potential for monitoring vital functions and medical diagnosis as a non-contact, real-time spectral imaging modality. This framework provides a basis for using spectral filter array cameras effectively for medical applications. It can be used to design spectral filter sensitivities for specific applications by optimizing the wavelength bands and transmission shapes of the filters. It is, however, necessary to verify the findings with experimental data and extend the framework to include spatial aspects.