Retrospective Continuous-Time Blood Glucose Estimation in Free Living Conditions with a Non-Invasive Multisensor Device

Even if still at an early stage of development, non-invasive continuous glucose monitoring (NI-CGM) sensors represent a promising technology for optimizing diabetes therapy. Recent studies showed that the Multisensor provides useful information about glucose dynamics with a mean absolute relative difference (MARD) of 35.4% in a fully prospective setting. Here we propose a method that, exploiting the same Multisensor measurements, but in a retrospective setting, achieves a much better accuracy. Data acquired by the Multisensor during a long-term study are retrospectively processed following a two-step procedure. First, the raw data are transformed to a blood glucose (BG) estimate by a multiple linear regression model. Then, an enhancing module is applied in cascade to the regression model to improve the accuracy of the glucose estimation by retrofitting available BG references through a time-varying linear model. MARD between the retrospectively reconstructed BG time-series and reference values is 20%. Here, 94% of values fall in zone A or B of the Clarke Error Grid. The proposed algorithm achieved a level of accuracy that could make this device a potential complementary tool for diabetes management and also for guiding prediabetic or nondiabetic users through life-style changes.


Introduction
Non-invasive continuous glucose monitoring (NI-CGM) has been widely investigated in the last years [1][2][3][4][5] because it would obviously represent an appealing technology to monitor glucose changes with no discomfort related to the use of subcutaneous needles [6][7][8] or implantable devices [6]. Although in terms of accuracy and reliability, the performance of NI-CGM is still far from that of commercial minimally-invasive CGM sensors, some encouraging results have been demonstrated in strictly controlled conditions during in-clinic sessions [5,[9][10][11]. However, the use of NI-CGM technology in uncontrolled conditions met in daily life has shown several critical aspects, mostly related to the influence of external perturbations, e.g., environmental factors and non-glucose related physiological confounders that influence the sensor measurements [10][11][12][13][14][15].
One interesting approach systematically evolved by Caduff et al. proposed to mitigate the effect of such perturbing factors is the multisensor concept, i.e., the embedment of glucose and non-glucose-related sensors in the same device [11,15]. The basic idea behind the multisensor approach is to measure perturbing effects together with the glucose-related effects to allow for proper compensation via suitable modelling [14,15]. In particular, Solianis Monitoring AG (Zurich, Switzerland) acquired by Biovotion AG (Zurich, Switzerland), proposed and developed a multisensor prospective estimate of glucose profiles by a global model, with only an initial calibration when the device is worn on the upper arm. The less accurate results, when compared with the current state-ofthe-art minimally invasive CGM devices [6] and the accuracy obtained with the Multisensor in controlled conditions [19], is related to the sub optimality of the models and the algorithmic routines that may not yet properly compensate for all the extrinsic (environmental) and intrinsic (physiological) perturbations typical of the uncontrolled conditions mentioned above. These limitations currently prevent the actual use of the Multisensor in real-time diabetes management. Here, we aim at assessing the performance of the Multisensor system in a less challenging scenario, i.e., in retrospectively estimating the blood glucose (BG) concentration given the raw sensor measurements and a few self-monitoring BG (SMBG) samples that serve as a reference.

Database
The database was acquired during a long-term study conducted with the Multisensor device [20] involving 20 patients with type 1 diabetes (13 male, 7 female, Caucasian origin, age 38 ± 13 years, bod mass index, BMI, 24.1 ± 3.0 kg m −2 , duration of diabetes 17.0 ± 13.0 years, HbA1c 7.5 ± 0.9%). Table  1 reports the subjects' demography and characteristics in finer detail. The study was performed in accordance with Good Clinical Practice and the Declaration of Helsinki. Figure 2 shows a graphical representation of the study procedure. After the initial screening visit (month 0), subjects entered block A of the study procedure, which consisted of one day of in-clinic sessions per subject. Following block A, subjects entered block B of the study design, which consisted of 10 days of home-use. This period of home-use was followed by another in-clinic period of 3 days and 2 nights per subject (block C). Then, the study ended with block D, a second period of home-use of at least 20 days for each subject. In total, the study lasted for 18 months and globally the following study days (or runs) were acquired: 20 days in block A, 200 days in block B, 99 days in block C, and 753 days in block D, for a total of 1072 runs (data acquired during the two in-clinic nights were not used in our analysis). The median length of all runs in the study was between 10 and 11 h. The minimum, maximum, and median durations of the home-use study blocks across subjects was 11, 79, and 31 days for block B and 29, 183, and 119 for block D. The most recent clinical study conducted with the Multisensor in free-living conditions showed a point accuracy of the estimated glucose versus a reference glucose measurement, computed as mean absolute relative difference (MARD), of 35.4% [18]. These results were obtained considering a fully prospective estimate of glucose profiles by a global model, with only an initial calibration when the device is worn on the upper arm. The less accurate results, when compared with the current state-of-the-art minimally invasive CGM devices [6] and the accuracy obtained with the Multisensor in controlled conditions [19], is related to the sub optimality of the models and the algorithmic routines that may not yet properly compensate for all the extrinsic (environmental) and intrinsic (physiological) perturbations typical of the uncontrolled conditions mentioned above. These limitations currently prevent the actual use of the Multisensor in real-time diabetes management. Here, we aim at assessing the performance of the Multisensor system in a less challenging scenario, i.e., in retrospectively estimating the blood glucose (BG) concentration given the raw sensor measurements and a few self-monitoring BG (SMBG) samples that serve as a reference.

Database
The database was acquired during a long-term study conducted with the Multisensor device [20] involving 20 patients with type 1 diabetes (13 male, 7 female, Caucasian origin, age 38 ± 13 years, bod mass index, BMI, 24.1 ± 3.0 kg m −2 , duration of diabetes 17.0 ± 13.0 years, HbA1c 7.5 ± 0.9%). Table 1 reports the subjects' demography and characteristics in finer detail. The study was performed in accordance with Good Clinical Practice and the Declaration of Helsinki. Figure 2 shows a graphical representation of the study procedure. After the initial screening visit (month 0), subjects entered block A of the study procedure, which consisted of one day of in-clinic sessions per subject. Following block A, subjects entered block B of the study design, which consisted of 10 days of home-use. This period of home-use was followed by another in-clinic period of 3 days and 2 nights per subject (block C). Then, the study ended with block D, a second period of home-use of at least 20 days for each subject. In total, the study lasted for 18 months and globally the following study days (or runs) were acquired: 20 days in block A, 200 days in block B, 99 days in block C, and 753 days in block D, for a total of 1072 runs (data acquired during the two in-clinic nights were not used in our analysis). The median length of all runs in the study was between 10 and 11 h. The minimum, maximum, and median durations of the home-use study blocks across subjects was 11, 79, and 31 days for block B and 29, 183, and 119 for block D. The in-clinic study days, i.e., the runs belonging to blocks A and C, include the following glucose data: blood samples taken routinely every 10 to 20 min via a venous catheter for BG reference using a HemoCue Glucose 201 + (HemoCue, Sweden); regular SMBG samples taken via finger pricking about every 60 min using an Ascensia Contour (Bayer, Switzerland) glucose monitor. The home-use days, i.e., the runs belonging to blocks B and D, include only the SMBG measurements, which were taken in free-living conditions. In total, during the 4 study blocks, 3431 HemoCue and 13,338 SMBG measurements were collected. Moreover, for each run in blocks A, B, C, and D, the raw data collected with the Multisensor were available for processing. The Multisensor device, which has been extensively described in previous studies (see, e.g., [21] and references quoted in) and is shown in Figure 1, embeds dielectric spectroscopy and optical modules, as well as temperature, humidity, and sweat sensors, plus a 3-axes inertial sensor. All sensor signals were measured and recorded every 20 s. The capabilities of the Multisensor system have been expanded in a separate development in the last few years with new algorithmic routines allowing the monitoring of vital signs such as heart rate, pulse oximetry, and heart rate variability within an upper arm worn subsystem of the Multisensor, focusing around optical sensing and now commercially termed Everion ® [22,23].  The in-clinic study days, i.e., the runs belonging to blocks A and C, include the following glucose data: blood samples taken routinely every 10 to 20 min via a venous catheter for BG reference using a HemoCue Glucose 201 + (HemoCue, Sweden); regular SMBG samples taken via finger pricking about every 60 min using an Ascensia Contour (Bayer, Switzerland) glucose monitor. The home-use days, i.e., the runs belonging to blocks B and D, include only the SMBG measurements, which were taken in free-living conditions. In total, during the 4 study blocks, 3431 HemoCue and 13,338 SMBG measurements were collected. Moreover, for each run in blocks A, B, C, and D, the raw data collected with the Multisensor were available for processing. The Multisensor device, which has been extensively described in previous studies (see, e.g., [21] and references quoted in) and is shown in Figure 1, embeds dielectric spectroscopy and optical modules, as well as temperature, humidity, and sweat sensors, plus a 3-axes inertial sensor. All sensor signals were measured and recorded every 20 s. The capabilities of the Multisensor system have been expanded in a separate development in the last few years with new algorithmic routines allowing the monitoring of vital signs such as heart rate, pulse oximetry, and heart rate variability within an upper arm worn subsystem of the Multisensor, focusing around optical sensing and now commercially termed Everion ® [22,23].

Method
To retrospectively reconstruct the BG time-series from the raw Multisensor data and relative reference BG samples, we implemented a two-step procedure, illustrated in Figure 3. In step 1, the raw Multisensor data were transformed to a BG estimate through a multiple linear regression model. In step 2, an enhancing module was applied to improve the accuracy of the BG estimation obtained at step 1 by retrofitting available SMBG samples through a time-varying linear model. The two steps are described in detail in the following subsections.

Step 1: Multiple Linear Regression
The multiple linear regression step implements the same model proposed in recent literature to estimate the BG from the Multisensor data [16][17][18]: where y is the n × 1 vector collecting the BG reference samples, X is the n × p matrix containing the Multisensor measurements, β the p × 1 vector of model coefficients, and e the n × 1 error vector, assumed to be independent and identically distributed. The data collected in X depicted in Figure 3 (first panel), are characterized by high dimensionality and high correlation because most of the p channels of the Multisensor provide measurements of dielectric properties of the skin as a function of the frequency, thus exhibiting the correlated behavior typical of spectroscopy data. The problem of estimating β from Equation (1) is thus ill-conditioned, requiring a suitable regularization approach to control complexity and avoid overfitting. Here, we used the elastic net regularization, resulting in the following solution: where α is the hyper parameter balancing the linear and quadratic penalty terms (details on its estimation later in Section 2.3.1). The model coefficientsβ were estimated from the training set, and then used to obtain the BG glucose prediction,ŷ, on an independent test set (training and test sets will be defined in Section 2.3.1): The estimated BG profileŷ was then scaled to match the first BG reference as in [16][17][18] (see second panel of Figure 3, dots and black curve, vs. SMBG references, triangle and red curve, from a representative subject). Note that, as a preprocessing procedure, the first 75 min of each Multisensor channel were removed from each run, since an adaptation process due to the Multisensor/skin contact dominates this time interval. Also, the Multisensor measurements in X were standardized to have unitary standard deviation (in the test set, the standard deviation estimated from the training set was used).

Step 2: Enhancing Module
The enhancing module was applied in cascade to the multiple linear regression to obtain a more accurate glucose estimation, by exploiting the available SMBG references. In particular, given the BG estimation of step 1,ŷ, let z be the vector containing the samples inŷ corresponding to the time instants at which the SMBG references were collected. Then, the following model was used to relate the BG estimation given by the linear regression model to the reference BG: where u is the vector of SMBG reference values, w is the measurement noise, b and c are model parameters, and ∆t is the vector containing the times from sor application (in min) at which each SMBG was acquired. The model parameters b and c are estimated by least squaresens: Then, the enhanced BG estimation,ŷ en , is obtained as follows: The enhanced BG profileŷ en is shown in the third panel of Figure 3, blue curve, vs. the output of step 1 (ŷ, dots and black curve) and the SMBG references (triangle and red curve), for a representative subject. The enhancing module and, in particular, the use of the time-varying model of Equation (4) serves to adjust the point BG estimate by considering possible time-varying factors, e.g., environmental or non-glucose related phenomena that can influence the estimation. An example of this phenomenon is observable in Figure 3, second panel, where the BG estimation before the enhancing module shows a time-varying drift with respect to the reference BG. The application of the enhancing module, and, in particular, of the time-varying model of Equation (4), allows the compensation of such drift, as visible in the enhanced BG estimation profile reported in the bottom panel of Figure 3.  The enhancing module was applied in cascade to the multiple linear regression to obtain a more accurate glucose estimation, by exploiting the available SMBG references. In particular, given the BG

Dataset Subdivision into Training and Test Sets
To assess the method and fairly evaluate the performance, the entire dataset has been subdivided into training and test sets. The available data from blocks A, B, and C were used as a training set to estimate the regression model parametersβ of Equation (2). In particular, we used all BG references acquired either with the EmoCue instrumentation or with the standard SMBG finger prick device for the estimation (i.e., in Equation (1), vector y contains all available BG references in the 20 subjects and matrix X contains the Multisensor measurements at corresponding times). In addition, the training data were also used to estimate the regularization hyperparameter α in Equation (2) by minimizing the standard error as calculated during a 5-fold cross-validation internal to the training set (i.e., the training runs were divided into 5 folds; for each fold, the standard error was computed by applying the regression model estimated in the remaining 4 folds, for various values of α). The available data from block D (which, we remind, were data acquired during free-living conditions) were used as a test set to assess the method. Firstly, the linear regression step was applied to each run as in Equation (3). Then, the enhancing module was used as in Equation (6). The parametersb andĉ of the enhancing module were estimated, for each run, by fitting the vector u of available SMBG references. In particular, besides using all available SMBG references, we also tested different scenarios in which a different number of SMBG samples were considered (from 10 to 4 SMBG references per run), in order to assess the robustness of the method against the number of available reference points. To create the scenarios with fewer than the total number of available SMBG, we defined a uniform time grid with the desired frequency of SMBG per run (from 10 to 4 references per day) and selected the references closer to the defined sampling times.

Comparison with the Current State-Of-The-Art Method
To assess the performance of our two-step method, we compared the accuracy of the reconstructed BG time-series to that obtained with the method discussed in [16][17][18][19] (hereafter referred to as the baseline method). It is based on a multiple linear regression model (like that used in step 1 of our two-step method) plus a one-point calibration to adjust the offset of the estimated BG profile by adding a constant value. In formal terms, with all variables as defined previously and k calibration constant, the baseline method is represented by:ŷ = Xβ + k.
The value of the constant k was determined in a one-point calibration procedure by matching the first BG reference acquired after sensor application.
The baseline method as described in [16][17][18][19] is intended for real-time use. The comparison of our retrospective approach with this real-time approach would not be fair, since we would use more information than only one BG reference. Thus, to allow a fair comparison, we fed the baseline method with the same amount of information used by our method, i.e., in estimating k in Equation (7) we used the mean of all available BG references instead of using only the first BG collected. Note that, for the baseline method, the estimation of the regression model parametersβ was conducted as in step 1 of our two-step method, with the same training-test subdivision of the dataset.

Performance Metrics and Statistical Analysis
The performance of our method and of the baseline method was assessed in terms of accuracy by comparing the retrospectively reconstructed BG time-series under test and the SMBG references acquired at the same time instants, using standard metrics. In particular, we performed two different evaluations. When all SMBG references were used to retrospectively reconstruct the BG time-series, then the same SMBG points were also used for accuracy assessment (indeed, there are no other references Sensors 2019, 19, 3677 7 of 12 available to compute accuracy). When instead only a subset of the available SMBG references were used in the estimation, accuracy was computed in the remaining subset.
For each run, we computed the mean absolute difference (MAD) and the MARD [24]. We then computed the percentage of data matching the SMBG standard International Organization for Standardization (ISO) 15197:2013 [25], i.e., the percentage of data falling within either 15 mg/dL from the reference measurement if the reference was lower than 100 mg/dL or within 15% of reference if reference was above 100 mg/dL (15/15%). We also considered two other ranges, 20/20% and 30/30%. Moreover, accuracy was assessed by computing for each run the percentage of data falling in zone A and zone B of the Clark Error Grid (CEG) [26]. Finally, the population mean (standard deviation) was reported for normally distributed metrics and population median [interquartile range] was reported for non-normally distributed metrics. Normality was assessed by the Lilliefors test.
The statistical significance of the differences in performance metrics obtained with our method and with the baseline method was determined by a Wilcoxon signed-rank test for non-normally distributed data and by a paired t-test for normally distributed data. In particular, we tested the null hypothesis that the median/mean (in case of non-normally/normally distributed data) difference between the paired values of the two groups was zero, with a significance level of 0.05. Table 2 shows the performance metrics of the new method compared to the baseline method when all available SMBG references were used to retrospectively reconstruct the BG time-series. The average number of SMBG samples used was 10 per run. The new method shows superior performance compared to the baseline method for all the considered metrics. Also, the improvement brought by the new method was statistically significant for all performance metrics (p-values < 10 −3 , not shown). In particular, focusing on metric MARD, which is one of the most popular metrics used in recent literature, a significant improvement from 25% to 20% (p-value < 10 −10 ) was observed. A boxplot of the MARD distribution in the population is reported in Figure 4a, where the left boxplot refers to the baseline method and the right boxplot to the new method, showing a significant MARD reduction. This global performance improvement was also observable at a single-run level.

CEG-A + B (%)
Baseline method 10 [3] 29 [16] 25 [14] 40 [27] 54 [28] 75 [29] 54 [29] 92 [26] New method 10 [3] 24 [13] 20 [12] 50 [28] 64 [28] 82 [19] 64 [28] 94 [25] In Figure 4b, we depicted how the MARD changes in each single run under test when passing from the baseline method to the new method. In particular, in the left plot we reported (in red) the runs in which the MARD does not improve with the new method, while in the right plot we reported (in green) the runs that show a MARD improvement with the new method. As clearly observable from the figure, there are only a few runs (about 5% of all test runs) in which the MARD shows a very slight deterioration (runs in left panel, in red). On the contrary, the majority of the test runs (runs in right panel, in green) shows a significant improvement with the new method.  In Figure 5, we reported the estimated BG time-series versus the reference SMBG measurements in a representative subject and run. It is observable how the BG estimation obtained with the new method (blue line) is much closer to the reference SMBG samples (red line) than the BG estimation obtained with the baseline method (black line). Indeed, MARD values of this specific run and subjects were 35% and 25% with the baseline and new method, respectively. In Figure 5, we reported the estimated BG time-series versus the reference SMBG measurements in a representative subject and run. It is observable how the BG estimation obtained with the new method (blue line) is much closer to the reference SMBG samples (red line) than the BG estimation obtained with the baseline method (black line). Indeed, MARD values of this specific run and subjects were 35% and 25% with the baseline and new method, respectively. After having demonstrated the superiority of the present method compared to the state-of-theart baseline method, we also assessed the robustness of the new method against the number of SMBG references used to retrospectively reconstruct the BG time-series. In particular, we implemented the After having demonstrated the superiority of the present method compared to the state-of-the-art baseline method, we also assessed the robustness of the new method against the number of SMBG references used to retrospectively reconstruct the BG time-series. In particular, we implemented the enhancing module (step 2 of the procedure described in Section 3) by varying the number of SMBG references from 10 to 4 per run.
Results of this assessment are reported in Table 3. Although, as expected, there was a slight deterioration of performance when reducing the number of SMBG references used for the estimation, the method appears relatively robust against the number of references available.
Indeed, the performance of the new method was always superior to that of the baseline method, independently from the number of SMBG references used (see the comparison reported for each performance metric in Table 3). Also, with the new method, even in the case in which only four SMBG samples were used, 90% of the estimations were in zone A or B of the CEG.

Discussion
Wearable non-invasive technologies for blood glucose monitoring, although widely investigated, still show important accuracy issues that hinder their use in a prospective real-time scenario. One of the major issues that contribute to sensor inaccuracy is the influence of external perturbations, e.g., environmental factors and non-glucose-related physiological confounders, on sensor measurements. The Multisensor system aims at mitigating the effect of such perturbing factors by embedding both glucose and non-glucose-related sensors in the same device and compensating for those via a multiple linear regression model [16][17][18][19]. We refer to this approach as the baseline method for BG estimation with the Multisensor.
A recent study showed that the Multisensor accuracy in estimating BG in free-living conditions and in a fully prospective scenario was 35.4% MARD [18]. The same prospective scenario, when applied in strictly controlled conditions, showed instead 21.1% MARD [19], confirming that the environmental and physiological perturbations typical of the uncontrolled conditions significantly impact sensor accuracy and thus need more sophisticated compensation.
Acknowledging the current existing accuracy limitations in utilizing the Multisensor for prospective (i.e., online) BG estimation, we analyzed here the less challenging, but still interesting, scenario of retrospective (i.e., offline) BG estimation. A retrospective approach allows the use of a few BG references to match the Multisensor estimation and compensate for those sources of bias (caused by physiological and/or environmental factors) that we may currently not be able to capture/describe with the Multisensor measurements/models.
In the retrospective approach presented in this paper, we have developed an enhancing module to be applied in cascade to the Multisensor regression model. The enhancing module consists of a time-varying linear model fit to a few SMBG references per day. To compare our results with the baseline method, we considered a retrospective implementation of the baseline method where we used exactly the same amount of information to feed the models, i.e., the same Multisensor input and the same SMBG references. The new method here proposed shows improved accuracy compared to the retrospective implementation of the baseline method, independently from the number of SMBG references used. When an average of 10 SMBG references per day was used as input data, MARD was reduced from 25% (baseline method) to 20% (new method). Also, the new method maintains superior performance compared to the baseline method when the number of SMBG inputs was reduced from 10 to 4 per day (30% MARD vs. 25% MARD).
The increased accuracy brought by the enhancing module attached in cascade to the regression model came at the cost of no more prospective applicability. However, we believe that a retrospective BG estimation from a non-invasive wearable device can still be useful for various applications. First of all, to estimate quality of glycemic control, magnitude of glycemic variability, daily patterns of hypoand hyperglycemia, and time of day with the highest risks of hypo-or hyperglycemia events. Secondly, the retrospectively estimated BG profile and relative identified patterns can be subsequently used to make long-term changes in diet, medications, insulin, and physical activity [27]. Such applications are of interest not only for people with diabetes, but also to guide prediabetic or nondiabetic people through life-style changes towards healthier habits [28].

Conclusions
Given their potential advantages in terms of economic cost and user acceptability, NI-CGM devices can represent an interesting tool to investigate. In recent studies, encouraging results of NI-CGM have been shown in controlled situations, but their use in free-living conditions still represents a challenge, mostly because of inaccuracy problems related to the influence of intrinsic and extrinsic perturbing factors that can have adverse effects on the sensor measurements.
In the present study we assessed the accuracy of a multisensor device for NI-CGM in the less challenging, but still useful application, of retrospective, off-line, BG estimation. We proposed a two-step procedure in which, at step 1, the Multisensor measurements were combined into a BG estimate by multiple linear regression, and at step 2 an enhancing module, which represents the major novelty of this work, was used to improve the BG estimation. The method proved effective in retrospectively reconstructing the BG time-series, showing improved performance compared to a current state-of-the-art method used as a baseline for comparison. In particular, MARD was reduced from 25% of the baseline method to 20% of the new method.
Although the point accuracy was still significantly worse than that of minimally invasive CGM sensors, the BG estimations obtained by retrospectively processing the Multisensor data could be combined and used as additional and complementary information to the vital signs already provided by Everion ® . It provides medical grade heart rate, blood oxygenation and perfusion, respiration rate, and further derived parameters [22,23,29], with the aim of facilitating effective glucose control in patients with diabetes but also in prediabetic or nondiabetic users that can benefit from this additional information for life-style changes, such as diet, weight loss programs, or physical exercise.
Funding: This research received no external funding.