Data-Driven Calibration Estimation for Robust Remote Pulse-Oximetry

: Pulse-oximetry has become a core monitoring modality in most ﬁelds of medicine. Typical dual-wavelength pulse-oximeters estimate blood oxygen saturation (SpO 2 ) levels from a relationship between the amplitudes of red and infrared photoplethysmographic (PPG) waveforms. When captured with a camera, the PPG waveforms are much weaker and consequently the measurement is more sensitive to distortions and noises. Therefore, an indirect method has recently been proposed where, instead of extracting the relative amplitudes from the individual waveforms, the waveforms are linearly combined to construct a collection of pulse signals with different pulse signatures, each corresponding to a speciﬁc oxygen saturation level. This method has been shown to outperform the conventional ratio-of-ratios based methods, especially when adding a third wavelength. Adding wavelengths, however, complicates the calibration. Inaccuracies in the calibration model threaten the performance of the method. Opto-physiological models have been shown earlier to provide useful calibration parameter estimates. In this paper, we show that the accuracy can be improved using a data-driven approach. We performed 5-fold cross validation on recordings with variations in oxygen saturation and optimized for pulse quality. All evaluated wavelength combinations, also without visible red, meet the required ISO standard accuracy with the calibration from the proposed method. This scalable approach is not only helpful to ﬁne-tune the calibration model, but even allows computation of the calibration model parameters from scratch without prior knowledge of the data acquisition details, i.e., the properties of camera and illumination.


Introduction
Ensuring adequate oxygen delivery to tissues is a prime objective of acute/critical medical care. Pulse-oximetry allows rapid and convenient measurement of the arterial oxygen saturation with low-cost hardware. It is regarded as the fifth vital sign and enables early detection of hypoxia and screening for critical congenital heart disease (CCHD) [1] and sleep apnea [2]. After the discovery of the basic principle by Aoyagi in the mid-1970s, the first pulse-oximeters were successfully marketed in the 1980s. Nowadays, pulse-oximeters are ubiquitously used in operating theatres, hospital wards, outpatient clinics and general practice surgeries for the monitoring of the critically ill. Before the invention of the pulse-oximeter, invasive arterial blood sampling with subsequent analysis was required. Pulse-oximetry relies on the optical absorption properties of oxygenated and deoxygenated blood measured by a non-invasive optical source-detector geometry, allowing a continuous estimation of blood oxygen saturation. However, pulse-oximeters are prone to motion artifacts and require skin contact.
Over the last few decades, methods have been presented which allow contactless monitoring of human vital signs with regular cameras in both visible and near-infrared conditions based on the detection of visually imperceptible skin color variations caused by blood volume variations. These vital signs include heart rate and derived features [3], respiration [4,5], and, more recently, oxygen saturation [4,[6][7][8][9][10]. Contactless monitoring is especially relevant for: (1) scenarios where direct skin contact should be prevented (e.g., premature infants [11,12] or patients with extensive burns [13]), (2) spot-check scenarios where a reading should be available within seconds, or (3) scenarios where the current diagnostic procedure could affect the clinical outcome (e.g., with polysomnography for sleep monitoring [14][15][16]). The main challenge of contactless measurements is that the signal is generally much weaker compared to its contact-based counterpart and hence more easily distorted. This is particularly detrimental for measurements where amplitudes have to be extracted from individual PPG waveforms, e.g., for the measurement of peripheral blood oxygen saturation (SpO 2 ). This motivated the development of an indirect method for the estimation of SpO 2 using SpO 2 -dependent pulse signatures [8], which does not require the individual extraction of amplitudes. This method, denoted "APBV" (Adaptive-PBV), showed the feasibility of estimating SpO 2 from distorted PPG signals, e.g., during movement, where the conventional ratio-of-ratios based method renders inaccurate. In order to apply this method, proper calibration is essential. Opto-physiological modelling provides a treasonable estimate but is not sufficiently accurate. Hence, commercially available pulse-oximeters are calibrated based on empirical data.
In this paper, we present an empirical, data-driven approach to estimate a calibration model for the indirect APBV method which can be applied to an arbitrary number and selection of wavelengths. This is relevant, as in our earlier paper [8], we showed that adding a third wavelength improves the robustness of the measurement. We demonstrate that, based on a dataset with only small variations in SpO 2 , an accurate calibration model can be determined which outperforms a calibration model based on modelling of the PPG spectrum. Furthermore, we show that the error only slightly increases using solely near-infrared (NIR) wavelengths when compared to the typical combination of visible red and NIR wavelengths, which confirms our earlier observations [8] and widens the scope for camera-based pulse-oximetry, e.g., for sleep or continuous 24/7 monitoring where patients are monitored in full darkness.

Background
Contactless, camera-based measurement of SpO 2 is a relatively new field of research. After pioneering work by Wieringa [6] and Kong [7] and Tarassenko [4], Verkruysse et al. [9] was the first to demonstrate the feasibility of calibratable SpO 2 estimation with a camera under normoxic and hypoxic conditions, which is non-trivial because of the fundamental differences in source-detector geometry. The conventional contact source-detector geometry collects light that has travelled through relatively deep vasculature (both in transmissive and reflective mode), whereas the contactless, wide-field illumination-detection geometry predominantly collects light that has travelled through much shallower tissue depths over much smaller distances. Verkruysse et al. used the commonly used "ratio-of-ratios" based approach and mentioned that low signal strength and subject motion present critical challenges that will have to be addressed to make camera-based pulse-oximetry practically feasible. As the name implies, with ratio-of-ratios, the ratio of two DC-normalized PPG waveforms is calculated and linked to an SpO 2 value via a calibration curve or look-up table, based on the assumption that the pulsatile, AC, component originates from variations in arterial blood only. This principle works fine for clean signals, but is rendered inaccurate when the signals are noisy or corrupted by motion artifacts. This motivated the development of the APBV method [8], inspired by the PBV method for camera-based pulse extraction by De Haan et al. [17] using the unique spectral signature of the blood volume pulse signal. A short description of the APBV method is provided in the next paragraph.

APBV
Instead of extracting amplitudes from the PPG waveforms and computing SpO 2 with ratio-of-ratios, APBV determines SpO 2 indirectly based on the signal quality of the pulse signals extracted with individual signature vectors P bv for each possible SpO 2 value [8]. This is attractive, as the optimum, indicating the SpO 2 -value, remains stable, even when the signal-to-noise ratio (SNR) worsens. The APBV-method can be mathematically summarized as: where P is the pulse signal, C n contains the DC-normalized color variations and scalar k is chosen such that W PBV has unit length. The calculation of the weights for extraction of the pulse signal, W PBV , is formulated as a least squares problem using pulse signatures P bv for different SpO 2 values. The relation between the direct ratio-of-ratios approach and indirect APBV method can be described as: where R is the ratio of normalized pulse amplitudes and C 1 , C 2 are calibration constants. The two most important implications of these formulations are that: (1) the use of ratio-of-ratios is limited to two wavelengths, whereas APBV can be executed with an arbitrary number of wavelengths, and (2) the calibration model of APBV, i.e., P bv (SpO 2 ), should be determined a priori. The first implication is important as it has been shown that adding a third wavelength improves the robustness of the measurement during motion [8]. In this paper, we focus on the implication of the APBV calibration model, and more specifically on the accuracy of the calibration estimation. The SpO 2 signatures compiled in P bv for N wavelengths can be expressed as: Here, P s bv denotes the (static) pulse signature vector for an oxygen saturation level of 100%, and P u bv denotes the update vector consisting of N parameters which describe the change of the signature for decreasing SpO 2 -values. In order to apply this model for SpO 2 estimation, 2(N − 1) parameters have to be determined. The values for one wavelength can be set to a fixed value for both P s bv (e.g., 1) and P u bv (0), as the values in both vectors describe the ratio in pulse amplitude between the wavelengths. We will now discuss how the parameters can be determined based on opto-physiological modelling. Hereafter, we will present our proposed empirical approach.

Opto-Physiological Modelling
In order to be able to determine the parameters for the calibration model, the PPG amplitude spectrum can be modelled, as we did in our earlier work [8]. A first order approach to estimate the relative PPG amplitudes and their dependence on wavelength is to simply weigh the absorptivity spectra for (deoxy)hemoglobin (Hb) and oxyhemoglobin (HbO 2 ). However, this would disregard the complex light interaction with the various skin layers. Since pulsatile arterioles are not distributed homogeneously throughout the skin, different wavelengths will experience different degrees of PPG intensity modulation not only due to the wavelength-dependent blood absorption coefficient, but also due to their different skin penetration depths. Various skin properties such as dermal scattering, venous blood concentration and blood vessel diameter [18] impact these skin penetration depths differently for different wavelengths. Therefore, we modeled diffuse reflectance for a layered skin model using the analytic approach by Svaasand et al. [19]. We also adopted their best guesses for the skin geometrical and optical properties for normal skin (e.g., epidermal and dermal scattering = 29×10 6 λ ). Three layers were modeled: epidermis, upper dermis and deep dermis, with thicknesses of 0.05, 0.5 and 4 mm, respectively, with skin blood volume fractions of 0%, 1% and 2%, respectively, both for venous and arterial blood. Saturation for venous blood was 70% while four different values for arterial blood were modeled: 70%, 80%, 90% and 100%, representing four SpO 2 values. At each arterial saturation level, we computed a PPG modulation as the normalized diffuse reflectance difference between a systolic and diastolic phase (slightly more/less arterial blood was modeled). One important departure from the modeling by Svaasand, however, is that we used the original absorptivities for Hb and HbO 2 measured by Zijlstra and Buursma [20]. Using their tabulated values (their Table 3) along with the plots (their Figure 2), we interpolated absorptivities for Hb and HbO 2 at 5 nm increments from 400-1000 nm. The modeled relative PPG spectrum is visualized in Figure 1. Finally, the SpO 2 -dependent parameters of the pulse signature vector P bv can be determined by: where I(λ), Cam(λ) and Filter(λ) are the illumination spectrum, the camera sensitivity and the transmission spectrum of the filter, respectively. This model is very generic, but clearly practical values for all parameters had to be chosen. The values we used, and which determined the experimental setup we used to create our dataset (Section 2.2.3), led to the following APBV calibration model: (5) Figure 1. The modeled relative PPG spectrum using the analytic approach by Svaasand et al. [19] for four SpO 2 values.

Empirical
In the current research, we aim at providing an empirical, data-driven alternative for the modeling approach, which shall be compared with the earlier modeling-based approach to show it is more accurate and more generic.

Framework
The general framework of our calibration approach is visualized in Figure 2 and consists of three steps. The first step is to extract the raw PPG signals from the videos. Hereafter, in the second step, an exhaustive search is executed over a range of possible pulse signatures of normalized amplitudes, P bv , where the signature corresponding with the best pulse-signal quality is selected as optimum. Finally, based on the collection of estimated signature vectors, a relative calibration model is fitted where the reference data are used to arrive at the APBV calibration model with vectors P s bv and P u bv which can be used for SpO 2 estimation. The three main differences between this approach and the previous calibration method for camera SpO 2 [9] are that: (1) the method can be applied for the indirect APBV method for an arbitrary number of wavelengths, (2) it can be applied on short time windows which allows the capturing of dynamics in SpO 2 , and (3) the camera measurements are not directly linked to SpO 2 values from the reference. In the next paragraphs, we discuss the steps of our proposed calibration approach in more detail.

Pre-Processing
We extracted the PPG signals from the forehead region-of-interest (ROI) using triangulation of facial landmark points. We spatially averaged the intensities of the pixels within the ROI for all wavelengths and concatenated the values for each video frame. For each time-window of 10 s, we divided the raw PPG signals for each wavelength (C i ) by its quasi-DC signal obtained by low-pass filtering (LPF), and bandpass filtered (BPF) the resulting signal to obtain the DC-normalized PPG waveforms: where the LPF is a first-order Butterworth filter with a cut-off frequency of 0.7 Hz, and the BPF is a fourth-order Butterworth filter with a passband in the range 0.7-4 Hz, the typical range of pulse rates for healthy adults. The DC-normalized PPG waveforms C n are calculated for each step-size of 1 s and are used as the input for the exhaustive search.

Exhaustive Search
As mentioned in Section 2.2, the signature vector P bv comprises the relative pulse amplitudes for each wavelength. Therefore, for N wavelengths, the search has to be performed over N − 1 dimensions, arbitrarily setting one of the wavelength pulse-amplitude to unity (1). For a three-wavelength system, this can be formulated as: where the exhaustive search is performed over a range of α and β relative to the unit amplitude of λ 2 . For our wavelengths present in the dataset, we set the evaluation range for α and β to 0-2 with a sampling resolution of 0.001. An example of the optimization for the wavelengths [λ 1 , λ 2 , λ 3 ] = [675, 800, 905] nm is visualized in Figure 3. We are aware that exhaustive search is computationally expense and that more efficient approaches exist, e.g., stochastic gradient descent or other simulated annealing approaches. We, however, do not worry too much that computational efficiency of what we describe could be further improved, as long as the accuracy is guaranteed.

Selection Criterion
The SpO 2 -value corresponds to the signature vector that yields the best pulse signal quality (Equation (1)). To assess the pulse quality, we used the skewness of the pulse spectrum. The rationale behind the skewness metric is that the spectrum of a clean pulse is highly peaked (i.e., a high skewness), whereas a noisy signal has a clearly lower skewness. The quality metric, Q, can be described as: where H P denotes the frequency spectrum of pulse signal P (Equation (1)) andH P is the average of all spectral components of H P . The example for two different oxygen saturation levels visualized in Figure 3 shows clear, distinct optima (α * ,β * ) using skewness as quality metric, which shift as function of SpO 2 . For each processing window, we stored the estimated pulse signature vector together with the corresponding value of the quality metric.

Model Fitting
After obtaining the collection of estimated signature vectors, the relative SpO 2 contrast, i.e., the change in pulse amplitude as a function of the SpO 2 value, between the wavelengths has to be determined for the update vector P u bv . We accomplished this by sorting the estimated relative amplitudes from the wavelength with the largest SpO 2 contrast in ascending order, as visualized in Figure 4, where the corresponding values from the other wavelengths are displayed at the same location on the horizontal axis. In order to prevent overfitting on the few samples corresponding to the lowest and highest SpO 2 values, first-order polynomials were fitted on the sorted samples to estimate the relative SpO 2 sensitivity using iteratively re-weighted least squares with a bisquare weighting function. Here, the weights are the values of the quality metric of each estimated signature vector. The slopes of the N − 1 fits are a scaled version of the update vector P u bv , indicated with P u * bv in Figure 4. In the next paragraph, we explain how the vector set for the APBV calibration model can be derived from the vectors P s * bv and P u * bv .

Determining the APBV Calibration Model
In order to determine the APBV calibration model, the range of SpO 2 values present in the dataset has to be incorporated. Contrary to most calibration methodologies, we did not link each data-point (pulse signature vector) to an SpO 2 value from the reference. Rather, we looked at the range and distribution of SpO 2 values to reduce the effects of outliers and to prevent overfitting on short-term desaturations with unknown physiological delays between the SpO 2 traces from the face (camera) and fingers (reference). The procedure is illustrated in Figure 5. Similar to the collection of estimated pulse signatures, we fitted a first-order polynomial to the sorted SpO 2 of the reference data. The ∆SpO 2 of this fit is used to determine P u bv , by scaling P u * bv . The signature vector for 100% SpO 2 , P s bv , is calculated by linear regression using the maximum value of the fit, SpO max 2 , and the update vector P u bv . Figure 5. The APBV calibration model is derived from P s * bv and P u * bv by incorporating reference values. The update vector P u bv is determined by scaling P u * bv with the ∆SpO 2 of the reference. Hereafter, P s bv is calculated by linear regression using SpO max 2 and P u bv .

Dataset Experimental Setup
The experimental setup consists of four monochrome CCD cameras (AVT Manta G-283B, Allied Vision GmbH, Stadtroda, Germany) equipped with four identical 150 mm lenses (Schneider-Kreuznach 7805791, Bad Kreuznach, Germany). To obtain spectral selectivity, optical bandpass filters with center wavelengths (CWLs) of 675, 760, 800 and 905 nm were used, whose transmission spectra are visualized in Figure 6. The cameras were externally triggered at a stable frame rate of 15 Hz and were horizontally spaced by 9 cm. The frames from the four cameras were registered using an affine transformation. Illumination was provided by two armatures (Falcon Eyes, Hong Kong, China), each equipped with 9 incandescent lamps (60 W, Philips, Amsterdam, the Netherlands) at a distance of about 1.5 m from the subject. A current-limited DC power supply set to 210 V, 3.95 A (SM330-AR-22, Delta Elektronica, Zierikzee, the Netherlands) powered the lamps. For SpO 2 reference, we used 4 conventional SpO 2 probes coupled to Philips MP2 patient monitors: a Philips finger sensor, a Philips ear sensor (M1191B and M1194A, Philips Medizin-Systeme, Böblingen, Germany), a Masimo finger sensor (LNCS DC-I, Masimo Corporation, Irvine, CA, USA), and a Nellcor finger sensor (DS-100A Medtronic, Dublin, Ireland). A sample-wise (1 Hz) median of all 4 probes was defined as the reference signal. Figure 6. The experimental setup comprises two armatures with incandescent light bulbs (blue) and four identical monochrome cameras (red) equipped with different optical filters (black) to obtain spectral selectivity. This information is used to calculate the values for the pulse signature vector (Equation (4)).

Protocol
The dataset consists of 25 recordings of 7 min each on one subject. The subject was asked to sit on a chair in upright position at a distance of approximately 8 m from the cameras. The head was supported by a soft support on the left side of face to prevent involuntary ballistocardiographic movements, which could affect the calibration process. The recording of the reference data was started 20 s prior to, and was stopped 20 s after the video recordings to allow synchronization of the camera and contact data because of the processing and physiological delays. During the first 3.5 min of the recording, the subject was asked to breath deeply but calmly and non-guided, resulting in a close to maximum SpO 2 value. In the following 3.5 min, the subject was asked to breathe more shallowly, which led to an average SpO 2 reduction of 3% with short-term desaturations as low as 86%. The rationale to use this protocol is that it is easily repeatable and leads to quasi-stable SpO 2 plateaus, in contrast to our earlier protocol with breath-hold events [8], leading to short dips in SpO 2 which cannot be used for calibration. The histograms of the SpO 2 values during both stages of the protocol are visualized in Figure 7.  Figure 7. The breathing protocol used for the creation of our dataset, 3.5 min deep breathing followed by 3.5 min shallow breathing, leads to an average SpO 2 reduction of 3% with short-term desaturations as low as 86%.

Alignment Camera-Reference
In order to compute evaluation metrics, the SpO 2 values from the camera and reference finger-probes needed to be aligned in time. The delay between both is a combination of nonlinear processing (reference values from black box algorithms with unknown (post-)processing and heuristics) and physiological delays (face versus finger). Therefore, we performed an individual synchronization of both data streams.
First, the data from the finger-probes were resampled at the sampling rate of the camera. Hereafter, we subtracted the mean from both the camera and reference traces to be less susceptible to a possible bias, and determined the delay using cross-correlation resulting in 22.6 ± 3.47 s (mean ± SD) for the dataset.

Evaluation Metrics
To evaluate the performance of the camera-based SpO 2 estimates, we computed the root-meansquare error (RMSE) and the bias (mean difference) metrics, which are calculated as: where SpO Re f 2 is the median of all four contact-probes to improve the reliability of the reference. The RMSE is used as metric in the ISO standard (80601-2-61, 2019) [21] for pulse-oximeters when compared to a reference obtained by arterial blood gas analysis (SaO 2 ). The ISO standard requires an accuracy with an RMSE <4% in the range 70% to 100% SaO 2 .

Results
We computed the results for both the model-based calibration model of Section 2.2.1 and the calibration determined by the proposed data-driven, empirical approach. To estimate the APBV signature vector sets for the two and three wavelength combinations, we used 5-fold cross validation, resulting in the calibration models: The APBV vector for all four wavelengths was derived from the two 3-wavelength models of Equation (10), where the values for the 905 nm wavelength were calculated as the mean of the [675,800,905] nm and [760,800,905] nm calibration models.
We computed the SpO 2 values with processing windows of 10 s, a step size of 1 s and used a 10 s moving average filter as post-processing. To prevent clipping which biases the results, we sampled pulse signature vectors within the SpO 2 range 75-110%, with a sampling resolution of 0.1%. The overall results are displayed in Table 1. The average SpO 2 traces of the dataset (n = 25) calculated with two three-wavelength combinations for both the model-based and the empirical calibration model are visualized in Figure 8. A visualization of the SpO 2 traces for all recordings is displayed in Figure 9. It can be observed that, although the same breathing protocol was used for all recordings, the dynamics are not identical. An overview of the evaluation metrics (pulse) quality, RMSE and bias for all recordings in the dataset using the model-based and empirically-derived calibration models is provided by Figure 10.    Figure 10. An overview of the evaluation metrics (pulse) quality, root-mean-square error (RMSE) and bias for all recordings in dataset using the model-based and empirically-derived calibration models. The results are calculated for two dual-wavelength and three-wavelengths combinations, and the derived four-wavelengths model.

Discussion
We evaluated the performance for different calibration models and for different (number of) wavelengths. In the next paragraphs, we discuss these results, together with our suggestions for future work.

Model-Based versus Empirical Calibration
The results presented in the previous section indicate that a calibration determined by opto-physiological modeling provides a reasonable estimate for a subset of the evaluated wavelength combinations. Especially for the evaluated wavelength combinations with 760 nm, the miscalibration leads to an RMSE of >4%. The calibration model determined by the proposed data-driven approach yields a much smaller RMSE of <2%, well within the required accuracy of the ISO standard [21]. The results indicate that the calibration model determined by modeling could be used as a starting point for the empirical approach. This would allow narrowing of the search range for the various wavelengths, which could greatly reduce the number of computations.

Red-NIR versus Full NIR
Most dual-wavelength pulse-oximeters are equipped with a red, typically 660 nm, and a NIR LED in the range 900-940 nm. These wavelengths are selected based on the SpO 2 contrast, i.e., the absorption difference between Hb and HbO 2 , and the low absorbance of water and other species in blood and tissue. The illumination is shielded from the environment in transmissive and reflective pulse-oximeters. For camera-based pulse-oximetry this is not the case, which has consequences for applications where visible light is not tolerated, e.g., for sleep monitoring. Using solely wavelengths in near-infrared (NIR) which are (almost) invisible to the human eye, the SpO 2 contrast reduces by a factor of 2 as can be observed from the estimated update vectors P u bv . A decreased SpO 2 contrast will increase the noise of the measurement. However, the strength of the PPG signal is also larger in full NIR, which could positively impact the performance because of the very small signals of the camera which are up to 100 times smaller than those of conventional contact probes [9]. In order to investigate how both factors should be balanced, we evaluated the performance for both red-NIR and full NIR wavelengths.
The comparative results show that red-NIR yields a slightly better performance than the full-NIR wavelength combinations. Both, however, satisfy the ISO standard of an RMSE <4%, which is very important as it greatly widens the scope of possible applications for camera-based pulse-oximetry, e.g., for the detection of sleep apnea. It should, however, be noted that the genericity of this observation should be verified on a population, rather than one subject. These results confirm our earlier observations [8], although with slightly different wavelengths in NIR.

Two versus Three Wavelengths
As explained earlier, the rationale to use three instead of the common two wavelengths is that a higher-dimensional (pseudo)-color space allows projections that are orthogonal to more independent distortions, e.g., caused by subject-motions or illumination changes [8]. The number of calibration parameters, however, increases which could affect the performance when not set correctly. The results show that performance with three wavelengths is slightly better, even when the subject is static. Figure 10 shows that a miscalibration does not affect pulse quality, but only the SpO 2 error when using two wavelengths. For three wavelengths, both the pulse quality and the SpO 2 error are affected. This can be explained by the 'flat' optimum when using >2 wavelengths. With two wavelengths, there always exists an optimum because of the one-dimensional space. With >2 wavelengths, a miscalibration leads to 'squeezing' of the signature vectors, which could result in a combination of a bias and an increased RMSE.

Future Work
This study showed the importance and ease of empirical, data-driven calibration. We recognize that our study also has limitations. Firstly, the current implementation guarantees accuracy, i.e., finding the global optimum, but is computationally expensive. As the number of calculations scales exponentially with the number of wavelengths, optimization of the processing time may become relevant if one would like to explore the use of APBV with more than three wavelengths. This can be achieved by narrowing the search range for each wavelength, using dedicated processing hardware and/or using computationally efficient search strategies.
We demonstrated the importance of empirical calibration for one subject with an average ∆SpO 2 of about 3%, and with SpO 2 values in the range 86-100%. We regard the results as proof-of-concept and do not claim that the estimated calibration models for our setup yield the best results for a population. This requires a calibration dataset with subjects with different skin-types and SpO 2 values over a larger range, which is difficult to realize with the current breathing protocol. Re-use of our previous calibration dataset was not possible as data was recorded with only two wavelengths [9]. Based on our earlier findings [9], RMSE <1.65% (n = 26), we have no reason to assume that the calibration is highly subject-dependent.

Conclusions
An empirical, data-driven approach for the determination of a calibration model with an arbitrary number and selection of wavelengths for camera-based SpO 2 estimation was proposed and tested. An exhaustive search yields the SpO 2 -dependent signature vectors corresponding to the relative amplitudes in the different wavelengths. Based on this collection of signature vectors, a calibration model is estimated for the indirect APBV method. We performed 5-fold cross validation on recordings with relatively small variations in oxygen saturation and optimized for pulse signal quality. Results show that, with this approach, the performance improves when compared to a calibration model based on opto-physiological modeling, with an average RMSE reduction of 1.84%. Furthermore, we showed that the error only slightly increases, 0.29%, using solely near-infrared (NIR) wavelengths when compared to a combination of visible red and NIR wavelengths, typically used in pulse-oximeters. This observation has important implications for the number of possible applications of camera pulse-oximetry, e.g., for sleep or continuous 24/7 monitoring. This scalable approach is not only helpful to fine-tune the calibration model, but even allows computation of the calibration model parameters from scratch without prior knowledge of the data acquisition details, i.e., the properties of camera and illumination.