Blood Glucose Level Regression for Smartphone PPG Signals Using Machine Learning

: Diabetes is a chronic illness that affects millions of people worldwide and requires regular monitoring of a patient’s blood glucose level. Currently, blood glucose is monitored by a minimally invasive process where a small droplet of blood is extracted and passed to a glucometer—however, this process is uncomfortable for the patient. In this paper, a smartphone video-based noninvasive technique is proposed for the quantitative estimation of glucose levels in the blood. The videos are collected steadily from the tip of the subject’s ﬁnger using smartphone cameras and subsequently converted into a Photoplethysmography (PPG) signal. A Gaussian ﬁlter is applied on top of the Asymmetric Least Square (ALS) method to remove high-frequency noise, optical noise, and motion interference from the raw PPG signal. These preprocessed signals are then used for extracting signal features such as systolic and diastolic peaks, the time differences between consecutive peaks (DelT), ﬁrst derivative, and second derivative peaks. Finally, the features are fed into Principal Component Regression (PCR), Partial Least Square Regression (PLS), Support Vector Regression (SVR) and Random Forest Regression (RFR) models for the prediction of glucose level. Out of the four statistical learning techniques used, the PLS model, when applied to an unbiased dataset, has the lowest standard error of prediction (SEP) at 17.02 mg/dL.


Introduction
Diabetes is an incurable chronic disease that occurs either when the pancreas is no longer able to produce insulin, or when the body is unable to utilize insulin properly [1,2]. This results in poor regulation of blood glucose level, which can lead to severe health complications such as chronic heart and kidney disease if blood glucose levels are not monitored carefully. According to the IDF (International Diabetes Federation) Diabetes Atlas [3] published in 2019, 463 million people in the 20-79 demographic have diabetes and is projected to reach 700 million by 2045. More than 4.2 million people died due to diabetes-related disorders in 2019.
The conventional approach to glucose level monitoring requires several apparatuses such as a glucometer, a one-time test strip, and a single-use lancet or lancing device to draw blood [4]. It also requires alcohol pads, gloves, and a band-aid to reduce the risk of infection for the patient. It is an uncomfortable process and one that people with diabetes need to use to monitor their blood glucose level regularly.
There is a demand for wearable, noninvasive, and smart health monitoring systems [5,6]. Previously, there have been studies conducted on noninvasive [7] or minimally invasive wearable devices based on different types of technologies for blood glucose level monitoring [8]. Some of the researchers used sensors built using carbon nanotubes [9,10], fluorescence [11], and plasmonic materials [12] to determine the glucose level. These tech-2 of 20 niques had some issues such as long calibration processes, size, and susceptibility to autofluorescence.
Several electrical techniques have been proposed for blood glucose level monitoring. Shaker, G. et al. experimented with a millimeter-wave radar system to track glucose level changes [13]. Similarly, Hanna, J. et al. experimented with glucose monitoring using electromagnetic devices in a noninvasive manner [14]. Other researchers utilized changes in bioimpedance [15] and ultrasound signal [16,17] to determine a subject's blood glucose level. In general, electrical techniques have room for improvement due to poor selectivity, temperature sensitivity, and skin irritation. A handful of previous studies have explored the potential of optical techniques in tracking changes in blood glucose level. Phan, Quoc-Hung [18] studied a polarimetric optical measurement method for noninvasive glucose determination. Optical Coherence Tomography (OCT) [19], near-infrared spectroscopy [20], and mid-infrared (MIR) spectroscopy [21] have also been utilized under laboratory conditions to gauge changes in blood glucose level in diabetic subjects. Unfortunately, optical techniques suffer from a significant lag time of up to 30 min as well as high temperature sensitivity. It is noteworthy that these proposed techniques require intricately organized and expensive laboratory equipment, making it unlikely for them to be used for daily measurement of blood glucose by millions of diabetics without significant reduction in price and complexity. Sensor-based noninvasive techniques have also been proposed for this purpose. Studies have shown the possibility of using GSR (Galvanic Skin Response) sensors [22], saliva [23,24], tear-based sensors [25], and sweat-based sensors [26] in order to measure blood glucose level. These techniques are generally inexpensive and require biological fluid samples other than blood to be collected, thereby reducing the invasiveness of the procedure and the subject's discomfort.
Photoplethysmography (PPG) is a low-cost, noninvasive technique that measures the volumetric change of blood in the arteries. Previously researchers have investigated the estimation of heart rate using PPG signals acquired by various methods such as endocavitary sensors [27] and pressure sensors [28]. Some researchers determined the glucose level using smartphone-based on the colorimetric and electrochemical methods [29]-this required an external apparatus to collect the sample before analyzing the sample using a smartphone. We also previously demonstrated that PPG could be a useful tool for blood glucose estimation [30].
A preliminary study conducted by Zhang, G. et al. proposed a subspace KNN (K-Nearest Neighbors) based method for classifying between diabetic/nondiabetic patients by using a PPG signal acquired from a single smartphone [31]. Using a binary classifier, this study constructed two classes in the (70 mg/dL-130 mg/dL) blood glucose range. However, the reported approach achieved a classification accuracy of 86.2% and used a Butterworth filter for denoising motion artefacts.
This study outlines the initial development steps towards a new reliable technique for blood glucose level estimation using a smartphone camera. A smartphone camera is used to record multiple videos from the same position of the fingertips for fifty-two subjects. A Gaussian filter is applied to denoise the raw PPG waveform generated, and the Asymmetric Least Squares (ALSs) method is used to remove baseline wander. Attributes such as systolic and diastolic peaks, the peak to peak time interval (DelT), first derivative peaks, and second derivative peaks are extracted from the denoised signal. These extracted features are subsequently fed into four statistical learning approaches-Principal Component Regression (PCR), Partial Least Squares Regression (PLS), Support Vector Regression (SVR), and Random Forest Regression (RFR) for the prediction of glucose level in blood. The paper focuses on data acquisition techniques, noise removals which are a big challenge in smartphone-based data acquisition systems, and the development of algorithms to provide increased acquisition capabilities, comforts, and flexibilities in measurements compared to the existing systems.

Experiment Details
The complete experimental procedure of noninvasive sensing and a comparison with the conventional method of glucose level prediction have been depicted in Figure 1. With the proposed noninvasive technique, a commercial smartphone camera can be used for the recording of a short video (20 s-50 s) of the subject's fingertip, which is then converted into frames containing RGB channel information of different wavelength. Red light has a longer wavelength than green or blue, which consequently allows it to penetrate deeper into tissue. Consequently, data from the red channel can be used for generating the respective PPG signal from the video frames. Regression models can then be trained with the acquired smartphone PPG signals, and corresponding glucose levels are acquired with a commercially available glucometer.

Experiment Details
The complete experimental procedure of noninvasive sensing and a comparison with the conventional method of glucose level prediction have been depicted in Figure 1. With the proposed noninvasive technique, a commercial smartphone camera can be used for the recording of a short video (20 s-50 s) of the subject's fingertip, which is then converted into frames containing RGB channel information of different wavelength. Red light has a longer wavelength than green or blue, which consequently allows it to penetrate deeper into tissue. Consequently, data from the red channel can be used for generating the respective PPG signal from the video frames. Regression models can then be trained with the acquired smartphone PPG signals, and corresponding glucose levels are acquired with a commercially available glucometer. Fifty-two subjects aged between 17 and 61 years, with a male to female ratio of 75:25, and with blood glucose levels ranging between 68 and 211 mg/dL participated in this experiment. The participants were as adequately briefed about the procedure before it began. A minimum of three trials were taken from each subject, resulting in one-hundred and ninety-one trials altogether. Each video recorded during the trials was 60 s long. All the subjects provided their informed consent paper and proper hygiene was maintained during the collection of reference values using a commercially approved glucometer (Accu-Check Active) which has an accuracy that meets the DIN EN ISO 15197:2013 requirements of a measuring interval that is 10-600 mg/dL. The subjects' fingers and smartphone camera lenses were also cleaned using alcohol wipes before acquiring video as part of this study.

Data Acquisition with Smartphone
The camera quality of commercially available smartphones has improved a lot over the last decade and offers excellent potential utility as a biomedical apparatus for data acquisition [32]. Nearly all smartphones can be used for the capture of PPG signals. However, the quality of the acquired signal can vary widely depending on the resolution and number of frames recorded by the camera per second. It is possible to extract PPG signals from a subject's fingertips or earlobe. In our study, the fingertip was chosen for data acquisition due to the convenience and comfort in the process for the subject. Olive et al. have previously shown earlobe pulse oximetry to be less reliable than fingertip pulse oximetry [33]. During the recording, the subject was asked to remain relatively still and comfortably place their index finger on the smartphone camera. Video data were subsequently Fifty-two subjects aged between 17 and 61 years, with a male to female ratio of 75:25, and with blood glucose levels ranging between 68 and 211 mg/dL participated in this experiment. The participants were as adequately briefed about the procedure before it began. A minimum of three trials were taken from each subject, resulting in one-hundred and ninety-one trials altogether. Each video recorded during the trials was 60 s long. All the subjects provided their informed consent paper and proper hygiene was maintained during the collection of reference values using a commercially approved glucometer (Accu-Check Active) which has an accuracy that meets the DIN EN ISO 15197:2013 requirements of a measuring interval that is 10-600 mg/dL. The subjects' fingers and smartphone camera lenses were also cleaned using alcohol wipes before acquiring video as part of this study.

Data Acquisition with Smartphone
The camera quality of commercially available smartphones has improved a lot over the last decade and offers excellent potential utility as a biomedical apparatus for data acquisition [32]. Nearly all smartphones can be used for the capture of PPG signals. However, the quality of the acquired signal can vary widely depending on the resolution and number of frames recorded by the camera per second. It is possible to extract PPG signals from a subject's fingertips or earlobe. In our study, the fingertip was chosen for data acquisition due to the convenience and comfort in the process for the subject. Olive et al. have previously shown earlobe pulse oximetry to be less reliable than fingertip pulse oximetry [33]. During the recording, the subject was asked to remain relatively still and comfortably place their index finger on the smartphone camera. Video data were Appl. Sci. 2021, 11, 618 4 of 20 subsequently collected for 60 s, with the participant's right index finger placed on the horizontal camera array and the LED (Light Emitting Diode) flash. The duration of 60 s allowed for a greater likelihood of obtaining acceptable PPG data, and allowed enough time in the beginning and the end of the segment for motion artifacts that are common with biomedical signal measurements. A maximum of five trials were collected from each participant. Each video was captured using the default camera application of the smartphone at 30 frames per second (fps) with a resolution of 1280 × 720 pixels.
To enhance the generalizability of the regression models trained, and to test the effectiveness of our denoising protocol, we initially shortlisted four smartphones to collect data in this study. Three of these phones were on Android platforms, whereas the fourth was on iOS. The four phones had different camera software running and the built-in cameras and optical setups of each offer a wide range of specification for data collection. After a preliminary study on the performance of each phone, the data collection was narrowed down to two phones-one based on an iOS platform and a second on Android.

Acquisition with iPhone
In the iOS category, an iPhone 7 Plus (Made by Apple Inc, San Francisco, CA, USA) phone was used for noninvasive data collection. The imaging unit in the device consists of a WLED (White LED) as the illumination source next to a 12-megapixel camera at a center-to-center separation of around 5 mm. The phone supports color video recording at about 30 fps up to a resolution of 3840 × 2160 pixels. For this study, the fingertip was recorded at 30 fps at a resolution of 720 pixels. This did not reduce signal quality as compared to that of 1080 pixels or 4K but took up significantly less phone memory and data transfer time.

Acquisition with Android Phone
As for Android-based platform, the choice for the smartphone is more complex. Android devices come in a wide range of hardware specifications regarding the built-in camera and the optical setup. Therefore, the quality of acquired data can vary significantly in different models constructed by different manufacturers. We therefore conducted a preliminary study (Supplementary Material Figure S1) to evaluate the performance of data acquisition on three smartphones with varying price ranges and camera specifications.
For the preliminary study, 3 trials were completed from five subjects using a Xiaomi Redmi Note 5 Pro (Xiaomi, Beijing, China), Samsung Galaxy Note 8 (Made by Samsung, San Jose, CA, USA), and OnePlus 6T (Made by OnePlus, Guangdong, China). The camera sensors on the Redmi Note 5 Pro and the Galaxy Note 8 are manufactured by Samsung, whereas the OnePlus 6T has a camera sensor produced by Sony IMX. All of them have a dual camera setup with an illumination source in the form of a WLED flash placed next to the cameras. For this portion of the study, the video data were recorded at 30 fps at a resolution of 1920 × 1080 pixels. Detailed specification for each of the phones is provided in Table 1. The Aperture column of Table 1 denotes the focal length of the smartphone camera. The quality of the PPG signal acquired from the same subject using three different smartphones is shown in Figure 2. In the first two signals, considerably more baseline variation can be observed in contrast with the third signal. To further evaluate the perfor-Appl. Sci. 2021, 11, 618 5 of 20 mance of the acquired signals by these phones, we attempted to extract the features from the acquired signals. For this, the signals were first preprocessed (Supplementary Material Figure S2) to remove high-frequency noise and baseline drift. Then, peak analysis was performed on the preprocessed PPG (Supplementary Material Figure S3)-the 1st derivative and 2nd derivative (Supplementary Material Figure S4) of the signals. From Supplementary Material Table S1, it can be seen the model was able to detect the 2nd derivative peak with very few misses and false positives across all the subjects. We performed peak detection and observed the results as shown in Figure 2. With 5 different subjects and multiple trials with the three candidate smartphones, we could see that the Xiaomi was outperformed by the others in terms of accuracy.
smartphones is shown in Figure 2. In the first two signals, considerably more baseline variation can be observed in contrast with the third signal. To further evaluate the performance of the acquired signals by these phones, we attempted to extract the features from the acquired signals. For this, the signals were first preprocessed (Supplementary Material Figure S2) to remove high-frequency noise and baseline drift. Then, peak analysis was performed on the preprocessed PPG (Supplementary Material Figure S3)-the 1st derivative and 2nd derivative (Supplementary Material Figure  S4) of the signals. From Supplementary Material Table S1, it can be seen the model was able to detect the 2nd derivative peak with very few misses and false positives across all the subjects. We performed peak detection and observed the results as shown in Figure 2. With 5 different subjects and multiple trials with the three candidate smartphones, we could see that the Xiaomi was outperformed by the others in terms of accuracy.
Even though the acquisition quality of Samsung Galaxy Note 8 and OnePlus 6T are comparable, Samsung Galaxy is more expensive with very similar specifications. Consequently, OnePlus 6T has been chosen as the desired Android platform for data acquisition for the study of glucose level detection. The further details on this study are available in the Supplementary resources.

Channel Selection
The video recorded from the subjects was stored onto the phone's memory in MPEG-4 (Moving Picture Experts Group) format. As discussed earlier, acquired smartphone videos were first converted into RGB frames containing information from three color channels (red, green, and blue) with varying wavelengths. Channel selection for obtaining a good quality PPG has been a challenge for researchers and therefore extensively studied in the past [34]. Blood perfusion variations depend on the wavelength of light since radiation of different wavelength penetrates and reaches vascular bed to varying depths in skin layers. Red light, which has a wavelength of 620 nm, reaches deeper (several centimeters) blood vessels in contrast to blue light (432 nm), which penetrates less than 1mm in deep. The red channel was often a default choice [35,36]. On the other hand, the authors of [34,37] claimed that the green color signal provided the best signal amplitude values for smartphones and considered this as more suitable than red and blue. Therefore, the authors of [34,38] conducted several experiments and compared the PPG signal strength in the three color bands, with and without a flashlight to obtain good quality PPG. [39] Grimaldi reported that the distribution of the pixels in the green channel is not uniform Even though the acquisition quality of Samsung Galaxy Note 8 and OnePlus 6T are comparable, Samsung Galaxy is more expensive with very similar specifications. Consequently, OnePlus 6T has been chosen as the desired Android platform for data acquisition for the study of glucose level detection. The further details on this study are available in the Supplementary resources.

Channel Selection
The video recorded from the subjects was stored onto the phone's memory in MPEG-4 (Moving Picture Experts Group) format. As discussed earlier, acquired smartphone videos were first converted into RGB frames containing information from three color channels (red, green, and blue) with varying wavelengths. Channel selection for obtaining a good quality PPG has been a challenge for researchers and therefore extensively studied in the past [34]. Blood perfusion variations depend on the wavelength of light since radiation of different wavelength penetrates and reaches vascular bed to varying depths in skin layers. Red light, which has a wavelength of 620 nm, reaches deeper (several centimeters) blood vessels in contrast to blue light (432 nm), which penetrates less than 1mm in deep. The red channel was often a default choice [35,36]. On the other hand, the authors of [34,37] claimed that the green color signal provided the best signal amplitude values for smartphones and considered this as more suitable than red and blue. Therefore, the authors of [34,38] conducted several experiments and compared the PPG signal strength in the three color bands, with and without a flashlight to obtain good quality PPG. [39] Grimaldi reported that the distribution of the pixels in the green channel is not uniform for different smartphone models. Bolkhovsky [40] suggested different channels for different phones. All these experiments prove that the selection of channel may vary based on several factors that include phone model due to camera characteristics, acquisition techniques (with or without LED), and region of interest. In the proposed system, we initially extracted a PPG signal from all three channels. In most cases, PPG extracted from the red channel was more prominent and less noisy than that of the green and blue channels, as seen in Figure 3.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 20 for different smartphone models. Bolkhovsky [40] suggested different channels for different phones. All these experiments prove that the selection of channel may vary based on several factors that include phone model due to camera characteristics, acquisition techniques (with or without LED), and region of interest. In the proposed system, we initially extracted a PPG signal from all three channels. In most cases, PPG extracted from the red channel was more prominent and less noisy than that of the green and blue channels, as seen in Figure 3.

Conversion into PPG Waveform
In a given frame, each pixel has slightly varying values. We averaged the red channel values of an entire frame to obtain one sample point of the respective PPG waveform. After extracting the sampling points from all the frames of recorded videos and ordering them in an array, complete PPG waveforms could be generated. The computation was performed using a custom-written MATLAB program. Regions of Interest (ROIs), where applicable, were removed from the frames before conversion.

Signal Quality
It was identified through experimental investigations that the best way to collect video data without much motion is to let the subject hold the phone on their hand and then place their fingertip on the camera sensor. This introduces fewer motion interferences as people naturally hold a phone in this way. Other approaches were also attenuated that caused many movements of the finger from the subject during data collection. A little distortion may result in scattered plots and acquired signals may suffer from severe baseline variations. An example of such an event is shown in Figure 4. Videos were also captured with the flash on and off. PPG signals obtained with a flash were much cleaner than without flash, and the red channel yielded the best overall output (Supplementary Material Figure S5).
Even after careful considerations about the type of camera to be used and orientation of the placement of the fingertip, the acquired signals had certain varieties. This is evident in Figure 4, which provides three sample signals collected from different subjects. It can be seen that one of them is quite clean, another has few baseline variations, and the last

Conversion into PPG Waveform
In a given frame, each pixel has slightly varying values. We averaged the red channel values of an entire frame to obtain one sample point of the respective PPG waveform. After extracting the sampling points from all the frames of recorded videos and ordering them in an array, complete PPG waveforms could be generated. The computation was performed using a custom-written MATLAB program. Regions of Interest (ROIs), where applicable, were removed from the frames before conversion.

Signal Quality
It was identified through experimental investigations that the best way to collect video data without much motion is to let the subject hold the phone on their hand and then place their fingertip on the camera sensor. This introduces fewer motion interferences as people naturally hold a phone in this way. Other approaches were also attenuated that caused many movements of the finger from the subject during data collection. A little distortion may result in scattered plots and acquired signals may suffer from severe baseline variations. An example of such an event is shown in Figure 4. Videos were also captured with the flash on and off. PPG signals obtained with a flash were much cleaner than without flash, and the red channel yielded the best overall output (Supplementary Material Figure S5).
highly influenced by the motion artifacts resulting in inaccuracy in calculation of vital features. These motion artifacts, baseline wander, and high-frequency noise are due to the invariably interference of the contact between the smartphone camera and the finger, the voluntary or involuntary subject movement that includes respiration and extrusion between the finger and the smartphone camera. Therefore, it is crucial to develop a reliable series of preprocessing steps to enhance uniformity and reliability of PPG signal acquisitions.

Invasive Data Acquisition Methodology
During the study, actual blood sugar data were simultaneously collected as a reference from the subject invasively using a commercially available medical standard glucometer. The meter can measure the sugar level from a small drop (1-2 µL) of blood sample collected from the fingertip of the subject. It can work in a range of 10-600 mg/dL. This technique also requires a one-time test strip and single-use safety lancets in addition to the meter. At first, the subject's hand was washed using a one-time hand sanitizer and alcohol solution on the finger from which the blood sample was taken. Then, a test strip was inserted into the glucometer to prepare the device for measuring the glucose level. The blood sample was collected by pricking the side of any finger using a lancet. Then, the pricked finger was touched and held on the edge of the test strip to transfer the blood drop from the finger to the test strip. The pricked finger was treated with a band-aid. The glucometer usually takes 5-8 s to process the samples and displays the glucose level in mg/dL on the screen. At least one trial was collected from each subject to acquire the reference value.

Methodology
The proposed approach for the noninvasive estimation of blood glucose level using a smartphone video has been explained through a flowchart in Figure 5. Video data were first obtained from the subjects' fingertips which were then converted into the respective PPG waveform. This signal contains noise and motion artefact, which were cleaned in the preprocessing steps. Relevant features were extracted from the preprocessed signals. These features were then fed into a regression model. The regression model was trained Even after careful considerations about the type of camera to be used and orientation of the placement of the fingertip, the acquired signals had certain varieties. This is evident in Figure 4, which provides three sample signals collected from different subjects. It can be seen that one of them is quite clean, another has few baseline variations, and the last one contains high-frequency noises and baseline variations. The accuracy of PPG signal is highly influenced by the motion artifacts resulting in inaccuracy in calculation of vital features. These motion artifacts, baseline wander, and high-frequency noise are due to the invariably interference of the contact between the smartphone camera and the finger, the voluntary or involuntary subject movement that includes respiration and extrusion between the finger and the smartphone camera. Therefore, it is crucial to develop a reliable series of preprocessing steps to enhance uniformity and reliability of PPG signal acquisitions.

Invasive Data Acquisition Methodology
During the study, actual blood sugar data were simultaneously collected as a reference from the subject invasively using a commercially available medical standard glucometer. The meter can measure the sugar level from a small drop (1-2 µL) of blood sample collected from the fingertip of the subject. It can work in a range of 10-600 mg/dL. This technique also requires a one-time test strip and single-use safety lancets in addition to the meter. At first, the subject's hand was washed using a one-time hand sanitizer and alcohol solution on the finger from which the blood sample was taken. Then, a test strip was inserted into the glucometer to prepare the device for measuring the glucose level. The blood sample was collected by pricking the side of any finger using a lancet. Then, the pricked finger was touched and held on the edge of the test strip to transfer the blood drop from the finger to the test strip. The pricked finger was treated with a band-aid. The glucometer usually takes 5-8 s to process the samples and displays the glucose level in mg/dL on the screen. At least one trial was collected from each subject to acquire the reference value.

Methodology
The proposed approach for the noninvasive estimation of blood glucose level using a smartphone video has been explained through a flowchart in Figure 5. Video data were first obtained from the subjects' fingertips which were then converted into the respective PPG waveform. This signal contains noise and motion artefact, which were cleaned in the preprocessing steps. Relevant features were extracted from the preprocessed signals.
These features were then fed into a regression model. The regression model was trained using extracted features and reference glucose values, collected invasively using a commercially available glucometer. Then, the trained model was used to estimate the glucose level of an unknown sample. The unknown samples used for the testing purpose followed the same preprocessing and feature extraction techniques as the training phase. The performance of the model was evaluated by calculating the Standard Error of Prediction (SEP), measured in mg/dL. using extracted features and reference glucose values, collected invasively using a commercially available glucometer. Then, the trained model was used to estimate the glucose level of an unknown sample. The unknown samples used for the testing purpose followed the same preprocessing and feature extraction techniques as the training phase. The performance of the model was evaluated by calculating the Standard Error of Prediction (SEP), measured in mg/dL.

Gaussian Filter
Signal cleaning can be completed using a variety of different filters. In our work, a Gaussian filter was used for high-frequency noise removal. A Gaussian filter [41] is a nonuniform lowpass filter, and consequently is a popular choice among the general-purpose filters for the removal of the high-frequency components of a PPG signal. This filter performs remarkably well as a bandpass filter in the frequency domain when it is tuned by the values of the mean center frequency and standard deviation. In our work, a Gaussian filter was used for noise removal, as shown in Equation (1) below: In Equation (1), the standard deviation can be adjusted by determining the degree of smoothing. The function performed the best for smoothing the PPG signal when the value of the center frequency was set to = 0.068 and the standard deviation was set to =

Gaussian Filter
Signal cleaning can be completed using a variety of different filters. In our work, a Gaussian filter was used for high-frequency noise removal. A Gaussian filter [41] is a nonuniform lowpass filter, and consequently is a popular choice among the generalpurpose filters for the removal of the high-frequency components of a PPG signal. This filter performs remarkably well as a bandpass filter in the frequency domain when it is tuned by the values of the mean center frequency and standard deviation. In our work, a Gaussian filter was used for noise removal, as shown in Equation (1) below: In Equation (1), the standard deviation can be adjusted by determining the degree of smoothing. The function performed the best for smoothing the PPG signal when the value of the center frequency was set to c = 0.068 and the standard deviation was set to w = 0.0543. A sample PPG signal with high-frequency noises which was cleaned by applying the Gaussian filter is given in Supplementary Material Figure S6.

Asymmetric Least Squares
ALS is a useful baseline correction technique [42]. Motion interferences were removed by the help of ALS, which is useful in correcting data with relatively narrow peaks. ALS is useful in making the systolic and diastolic peaks more prominent, which are crucial attributes for feature extraction. The baseline correction through ALS improved the overall signal quality as the fluctuations between peaks were decreased.
However, some high-frequency noise could still be observed in signals processed through ALS, as seen in Supplementary Material Figure S7. ALS was followed by a smoothing process applied through the Gaussian filter on the signal to remove existing high-frequency noise. A preprocessing model with ALS as a first stage and Gaussian filter as the second stage significantly improves the quality of the acquired PPG signals. This can be seen in Figure 6, which shows a raw PPG signal cleaned with Gaussian filter and corrected using ALS. All the signals were normalized from 0 to 1 F.  Figure S6.

Asymmetric Least Squares
ALS is a useful baseline correction technique [42]. Motion interferences were removed by the help of ALS, which is useful in correcting data with relatively narrow peaks. ALS is useful in making the systolic and diastolic peaks more prominent, which are crucial attributes for feature extraction. The baseline correction through ALS improved the overall signal quality as the fluctuations between peaks were decreased.
However, some high-frequency noise could still be observed in signals processed through ALS, as seen in Supplementary Material Figure S7. ALS was followed by a smoothing process applied through the Gaussian filter on the signal to remove existing high-frequency noise. A preprocessing model with ALS as a first stage and Gaussian filter as the second stage significantly improves the quality of the acquired PPG signals. This can be seen in Figure 6, which shows a raw PPG signal cleaned with Gaussian filter and corrected using ALS. All the signals were normalized from 0 to 1 F.

Feature Extraction
After applying preprocessing steps to the PPG signals, certain features were extracted from the signals. Multiple feature vectors were created through different combination of these features, which were then used in training the regression models. A feature could be any point in the signal or a multitude of information can be combined to make one feature. For this study, we extracted the following features from the PPG signal: systolic peaks, diastolic peaks, (DelT), first derivative peaks, and second derivative peaks.

Systolic and Diastolic Features
The direct pressure wave travelling from the left ventricle to the periphery of the body represents the systolic peak. On the other hand, the reflections of the pressure wave by arteries of the lower body represent the diastolic peak. Each cycle of the PPG signal contains one systolic and one diastolic peak as seen in Figure 7a. These peak locations were extracted as features of the PPG signal.
DelT refers to the time separation between systolic and diastolic peaks. It was computed from each pair of the systolic-diastolic peaks of the signal. DelT found from the

Feature Extraction
After applying preprocessing steps to the PPG signals, certain features were extracted from the signals. Multiple feature vectors were created through different combination of these features, which were then used in training the regression models. A feature could be any point in the signal or a multitude of information can be combined to make one feature. For this study, we extracted the following features from the PPG signal: systolic peaks, diastolic peaks, (DelT), first derivative peaks, and second derivative peaks.

Systolic and Diastolic Features
The direct pressure wave travelling from the left ventricle to the periphery of the body represents the systolic peak. On the other hand, the reflections of the pressure wave by arteries of the lower body represent the diastolic peak. Each cycle of the PPG signal contains one systolic and one diastolic peak as seen in Figure 7a. These peak locations were extracted as features of the PPG signal.

First Derivative Features
The first derivative usually refers to the instantaneous rate of change. It indicates the function direction, whether the function is increasing or decreasing and how fast it increases or decreases. The number of peaks (features) in the PPG signal was increasing after the first derivative since PPG is a periodic function. To determine the first derivative of the PPG signal, a two-point central difference method was used as shown in Equation (2) below: here ( ) is coming from the existing preprocessed array whereas ( ) is stored in a new array. Since the first derivative signals also had negative peaks, the signal was processed further before extracting features from the derivative signal. This makes the signal more prominent, and features could be extracted easily from it. A derivate of a sample PPG signal is shown in Figure 7b where the extracted features have been marked.

2nd Derivative Features
Similar to the first derivative, the second derivative is usually a derivative of the slope. It also increased the peaks (features) of the signal that is more than the first derivative. A three-point central difference method was used to calculate the second derivative, as shown in Equation (3): DelT refers to the time separation between systolic and diastolic peaks. It was computed from each pair of the systolic-diastolic peaks of the signal. DelT found from the time difference between the systolic and diastolic peak in each cycle of the preprocessed PPG signal were stored in an array. The systolic, diastolic and DelT features are presented in a sample PPG signal in Figure 7a.

First Derivative Features
The first derivative usually refers to the instantaneous rate of change. It indicates the function direction, whether the function is increasing or decreasing and how fast it increases or decreases. The number of peaks (features) in the PPG signal was increasing after the first derivative since PPG is a periodic function. To determine the first derivative of the PPG signal, a two-point central difference method was used as shown in Equation (2) below: here a(j) is coming from the existing preprocessed array whereas d(j) is stored in a new array. Since the first derivative signals also had negative peaks, the signal was processed further before extracting features from the derivative signal. This makes the signal more prominent, and features could be extracted easily from it. A derivate of a sample PPG signal is shown in Figure 7b where the extracted features have been marked.

2nd Derivative Features
Similar to the first derivative, the second derivative is usually a derivative of the slope. It also increased the peaks (features) of the signal that is more than the first derivative. A three-point central difference method was used to calculate the second derivative, as shown in Equation (3): here, the same as before, a(j) represents the points in the existing preprocessed array and d is the newly calculated second derivative of that array. The second derivate of a sample PPG signal is shown in Figure 7c where the extracted features have also been marked. The figure shows how the first and second derivative enhances the peaks of a PPG signal.

Regression Analysis
To estimate the glucose level, regression analysis was performed on the PPG signals acquired through a smartphone camera and their references. A different regression model was trained on raw signals, preprocessed signals, and the signal features that were extracted from the preprocessed signals to evaluate and compare the performance in noninvasive quantitative estimation.
Regression techniques such as CLS (Classical Least Squares), PCA (Principal Component Analysis), PCR and PLS have been studied in the past for quantitative analysis. Both PLS and PCR have widely been used in chemometrics as multivariate calibration methods, and these methods can be applied when that dataset has correlated predictor variables [40]. Both regression methods create new predictor variables (components) as linear combinations of the original predictor variables. PLS creates these components while considering the observed response values. On the other hand, PCR creates components without considering the response values at all. Both regression methods have reliable predictive power. PCR was chosen for this work due to its reported ability to predict glucose levels from NIR (Near Infrared) spectra with 94-95% accuracy [20].
Additionally, models were trained using SVR and RFR. SVR is a modification to the commonly used classification algorithm called Support Vector Machine (SVM). While SVM is primarily used in classification problems with discrete finite labels, SVR can be used to predict continuous variables such as blood glucose level. In SVR, instead of trying to minimize the error outright as with traditional regression techniques, hyperplanes are constructed to fit the error observed within a small threshold of ε (epsilon). Previous studies have used support vector-based techniques on PPG data to predict other continuous biological variables such as blood pressure estimation [43].
Finally, ensemble learning was used in the form of RFR-a technique based on using several decision trees to predict a continuous value. Classical Decision Tree Regression (DTR) involves gradually developing a decision tree from smaller subsets of the training data and can work reasonably well in mapping nonlinear functions. Outputs from several decision trees are averaged together to provide an ensemble estimate of the predicted value. Other studies have demonstrated the efficacy of RFR in the prediction of biological variables from ECG or PPG data, often outperforming other regression techniques [44].
The following sections discuss the experiment performed on each of these learning methods in detail. For the purpose of clarity in discussion, we refer to PLS and PCR as statistical learning, and refer to SVR and RFR as machine learning, in order to reflect the commonly used terminology for each technique.

Experiment with Random Test Cases
A train-test split of 75/25 was used in the first experiment. The model that achieved high accuracy was then tested with subject wise fully unbiased test cases which are elaborated in the subsequent sections. However, a large amount of variability was present in individual smartphone PPG signal instances, even when acquired from the same subject. Among these trials, there were different quality PPG signals, as shown in Figure 8. The preprocessing steps were held constant throughout the trials-which involved denoising through Gaussian filter and baseline corrections using ALS. The first derivative was computed upon the preprocessed signal. After that, only the positive and negative peak values were extracted as features. Figure 8 shows two representative PPG signals, their transformation after preprocessing, and the corresponding extracted features. It can be observed that first derivative features were extracted from both the signals without any issue. The proposed signal processing model offers reliability in handling PPG signals of various signal qualities, and may therefore reduce the burden during data acquisition, as an acceptable PPG signal may be acquired in spite of baseline drift and high-frequency noise. were extracted as features. Figure 8 shows two representative PPG signals, their transformation after preprocessing, and the corresponding extracted features. It can be observed that first derivative features were extracted from both the signals without any issue. The proposed signal processing model offers reliability in handling PPG signals of various signal qualities, and may therefore reduce the burden during data acquisition, as an acceptable PPG signal may be acquired in spite of baseline drift and high-frequency noise. In this study, each recorded signal contains 60 s of data. However, 5 s was initially removed from the beginning of the segment and the end of the segment. This was carried out due to the natural prevalence of large motion artifacts in the initialization and the conclusion of a recording. Several PCR, PLS, SVR and RFR models were developed with 50 s of raw PPG signal in every instance with a variety of different features. These include models based on a combination of the preprocessed PPG signal via ALS and Gaussian filter, first derivative and second derivative features extracted from the preprocessed PPG signal. The following sections outline the observed results from the various regression techniques used to predict blood glucose level and draw comparisons between the methods shown. Figure 9 shows the estimated Standard Error of Prediction (SEP) against the different number of principal components (PCs) achieved for PCR model built with different approaches. For the PCR model built using second derivative characteristic features, the SEP was markedly greater than other approaches. In addition, when the numbers of principal In this study, each recorded signal contains 60 s of data. However, 5 s was initially removed from the beginning of the segment and the end of the segment. This was carried out due to the natural prevalence of large motion artifacts in the initialization and the conclusion of a recording. Several PCR, PLS, SVR and RFR models were developed with 50 s of raw PPG signal in every instance with a variety of different features. These include models based on a combination of the preprocessed PPG signal via ALS and Gaussian filter, first derivative and second derivative features extracted from the preprocessed PPG signal. The following sections outline the observed results from the various regression techniques used to predict blood glucose level and draw comparisons between the methods shown. ian and ALS filtering, the SEP was observed to be 21.3 mg/dL with the first two PCs, whereas in the models that included first derivative and second derivative features, the lowest SEP was observed with five components at 18.33 mg/dL. The SEP reduced even more (17.09 mg/dL) for the PCR model built with first derivative characteristic points extracted from the preprocessed PPG using 11 PCs. The preprocessing and feature extraction approach significantly improved the prediction ability of the PCR model. Here, it is essential to note that systolic and diastolic features alone did not improve the accuracy; on the other hand, the performance with the second derivative features was not noteworthy. The comparison of performances among different PCR models in terms of the lowest SEP achieved and minimum no. of PCs required has been shown in Table 2.  In this study, regression models were also built using PLS. Figure 10 shows the estimated SEP against a different number of principal components (PCs) achieved for the PLS model built with different approaches. In general, there was a linear relationship between the number of PCs used and the corresponding SEP. It could be observed that the SEP increased with the increase in the number of principal components for the PLS model with the first derivative signal. SEP on the second component was 22.09 mg/dL which increased to 50.59 mg/dL for 11 PCs. After performing second derivative and building model based on this, the lowest SEP was recorded as 21.37 mg/dL on the second component. The SEP reduced even more when the PLS model was built with the help of first derivative characteristics points extracted from the preprocessed PPG, and it was 17.02 mg/dL for the second component. A significant reduction in SEP could be observed in the other approaches. Using a model built with the DeIT features obtained from the signal preprocessed through Gaussian and ALS filtering, the SEP was observed to be 21.3 mg/dL with the first two PCs, whereas in the models that included first derivative and second derivative features, the lowest SEP was observed with five components at 18.33 mg/dL. The SEP reduced even more (17.09 mg/dL) for the PCR model built with first derivative characteristic points extracted from the preprocessed PPG using 11 PCs. The preprocessing and feature extraction approach significantly improved the prediction ability of the PCR model. Here, it is essential to note that systolic and diastolic features alone did not improve the accuracy; on the other hand, the performance with the second derivative features was not noteworthy. The comparison of performances among different PCR models in terms of the lowest SEP achieved and minimum no. of PCs required has been shown in Table 2. In this study, regression models were also built using PLS. Figure 10 shows the estimated SEP against a different number of principal components (PCs) achieved for the PLS model built with different approaches. In general, there was a linear relationship between the number of PCs used and the corresponding SEP. It could be observed that the SEP increased with the increase in the number of principal components for the PLS model with the first derivative signal. SEP on the second component was 22.09 mg/dL which increased to 50.59 mg/dL for 11 PCs. After performing second derivative and building model based on this, the lowest SEP was recorded as 21.37 mg/dL on the second component. The SEP reduced even more when the PLS model was built with the help of first derivative characteristics points extracted from the preprocessed PPG, and it was 17.02 mg/dL for the second component. The comparison of performances among different PLS models in terms lowest SEP achieved and minimum no. of PCs required can be found in Table 3. It is important to note that with PLS-based models, all the lowest SEPs were achieved with just two principal components.  The SVR models were built with a Radial Basis Function (RBF) kernel. This kernel function was used to transform an n-dimensional input to an m-dimensional input, where m denotes much higher than n. The RBF kernel finds the dot product in higher dimensional more efficiently than any regular linear or polynomial kernels. The main idea to use kernel is to fit a linear classifier or regression curve in higher dimensions, which consequently produces a nonlinear classifier or regression curve in lower dimensions. In our study, the size of the kernel cache was set to 400 for training and the constant regularization parameter C was set to 1.0. The maximum penalty ε (epsilon) was tuned from 0 to 0.6 for each of the SVR model to reduce SEP. Figure 11 illustrates the estimated SEP of SVR models built with different features against epsilon tuned within a certain range. With the exception of the first derivative SVR model, a decreasing trend in SEP could be observed for the SVR models with an increase in epsilon from 0 to 0.5. The lowest SEP recorded for different models is shown in Table 4. The SVR model with second derivative characteristics features achieved the lowest SEP of 18.52 mg/dL even though other models also had comparable margins of error. The comparison of performances among different PLS models in terms lowest SEP achieved and minimum no. of PCs required can be found in Table 3. It is important to note that with PLS-based models, all the lowest SEPs were achieved with just two principal components. The SVR models were built with a Radial Basis Function (RBF) kernel. This kernel function was used to transform an n-dimensional input to an m-dimensional input, where m denotes much higher than n. The RBF kernel finds the dot product in higher dimensional more efficiently than any regular linear or polynomial kernels. The main idea to use kernel is to fit a linear classifier or regression curve in higher dimensions, which consequently produces a nonlinear classifier or regression curve in lower dimensions. In our study, the size of the kernel cache was set to 400 for training and the constant regularization parameter C was set to 1.0. The maximum penalty ε (epsilon) was tuned from 0 to 0.6 for each of the SVR model to reduce SEP. Figure 11 illustrates the estimated SEP of SVR models built with different features against epsilon tuned within a certain range. With the exception of the first derivative SVR model, a decreasing trend in SEP could be observed for the SVR models with an increase in epsilon from 0 to 0.5. The lowest SEP recorded for different models is shown in Table 4. The SVR model with second derivative characteristics features achieved the lowest SEP of 18.52 mg/dL even though other models also had comparable margins of error.  Finally, similar models were built with Random Forest Regression (RFR) for the estimation of glucose level and performances were evaluated using the same metric. RFR models are built with estimator trees/nodes and in this study, the total number of trees/nodes was increased up to 70 for all the models to identify the optimum number of trees required for the prediction with the lowest error. Some sample leaves were also added for the improvements and the total number was set to six. The SEP of RFR models built with different features against the number of estimator tress/nodes can be found in Figure 12. The prediction error for all the RFR models decreased with the increase in the number of trees/nodes. This reduction in SEP was comparatively sharp for the model built with first derivative features particularly for the first 15 trees. The lowest SEP recorded for different models along with the number of trees required is provided in Table 5    Finally, similar models were built with Random Forest Regression (RFR) for the estimation of glucose level and performances were evaluated using the same metric. RFR models are built with estimator trees/nodes and in this study, the total number of trees/nodes was increased up to 70 for all the models to identify the optimum number of trees required for the prediction with the lowest error. Some sample leaves were also added for the improvements and the total number was set to six. The SEP of RFR models built with different features against the number of estimator tress/nodes can be found in Figure 12. The prediction error for all the RFR models decreased with the increase in the number of trees/nodes. This reduction in SEP was comparatively sharp for the model built with first derivative features particularly for the first 15 trees. The lowest SEP recorded for different models along with the number of trees required is provided in Table 5   Finally, similar models were built with Random Forest Regression (RFR) for the estimation of glucose level and performances were evaluated using the same metric. RFR models are built with estimator trees/nodes and in this study, the total number of trees/nodes was increased up to 70 for all the models to identify the optimum number of trees required for the prediction with the lowest error. Some sample leaves were also added for the improvements and the total number was set to six. The SEP of RFR models built with different features against the number of estimator tress/nodes can be found in Figure 12. The prediction error for all the RFR models decreased with the increase in the number of trees/nodes. This reduction in SEP was comparatively sharp for the model built with first derivative features particularly for the first 15 trees. The lowest SEP recorded for different models along with the number of trees required is provided in Table 5

Subject Wise Glucose Level Prediction
In this experiment, PPG signals of ten subjects with glucose levels ranging from 73 to 153 mg/dL were initially separated from the training set in order to ensure independence in the test set from the training set. It is important to mention that some of the model building parameters such as the no. of principle components were retained from the previous experiment in this section of our study.

Statistical Approach
In subject wise tests, performance of both PLS-and PCR-based models degraded slightly as can be seen in Table 6. PCR models produced a slightly better result than PLS. PCR model built with DelT features had the highest SEP for both PLS and PCR which are 27.78 and 28.57 mg/dL, respectively. The lowest SEP of 25.50 mg/dL was recorded with PCR model built with first derivative features which is 1.92 mg/dL lower than the PLS-based best performing model. It is important to note that same model had the lowest SEP in the previous experiments too.

Machine Learning Approach
Subject wise tests were also carried out for SVR-and RFR-based models to compare their SEPs with the previous results. It is notable to mention that the performance has degraded here as well. The summary of the predicted error for both of these machine learning techniques have been provided in Table 7. The SEPs for different SVR models were too close to each other and they differ only by +/−0.1 mg/dL. The same is true for RFR models too. However, SVR models have overall produced slightly better results than RFR. The lowest SEP was recorded to be 26.56 mg/dL for SVR model built with second derivative characteristics points, whereas RFR had the lowest SEP of 29.6 mg/dL for the first derivative model. We have also performed k-fold cross-validation on the proposed models. The summary of the experiments and the results are provided in the Supplementary Material Tables S2 and S3.

Discussion
In this study, both statistical learning (PCR and PLS) and machine learning (SVR and RFR) models were built using various approaches to observe and compare their performances to SEP. At first the dataset was split into 75:25 ratios for model building an independent test set. A minimum of three PPG signals were acquired from each human subject. The data were split into training and test sets randomly without any prior stratification. This experiment was useful for identifying the dominating features and some of the key model parameters in prediction of glucose with the lowest error. Statistical learning methods performed well overall and first derivative and derivative features dominated in models built with both statistical and machine learning methods.
There was an around 8.5 mg/dL reduction in SEP in the subject wise rigorous testing. The key observation here is that the models that performed well in the earlier experiment maintained good performances in subject wise testing too. PCR and PLS models built with first derivative characteristics points performed well in both types of experiments. The SEPs achieved with these models were comparatively lower than the others. Furthermore, the lowest SEP for these models were recorded for the same no. of principal components. This shows the possibility of using the first experiment for the selection of dominating features, optimum model, and key model parameters which can then be used for real tests.
In this study, PCR-and PLS-based models generally outperformed the models based on SVR and RFR, as seen in Figure 13. This could be attributed to the limited dataset used in the study. Zhang, G. et al. used PPG signals acquired from smartphones for the classification of the glucose level. Their model, with an accuracy of over 80%, can predict one of the three diabetic status-normal, borderline, and warning of a subject. However, knowing the quantitative information of the actual glucose level is more important than knowing the diabetic status. We have demonstrated that our model can solve that problem by predicting the actual glucose level with an error of less than 20 mg/L. First derivative and second derivative characteristic points were the dominating features with the machine learning models. SVR based models performed comparatively well in prediction in both experiments, whereas RFR models performed poorly in both the experiments. We have also performed k-fold cross-validation on the proposed models . The summary  of  the  experiments  and  the  results  are  provided  in  the  Supplementary Material Tables S2 and S3.

Discussion
In this study, both statistical learning (PCR and PLS) and machine learning (SVR and RFR) models were built using various approaches to observe and compare their performances to SEP. At first the dataset was split into 75:25 ratios for model building an independent test set. A minimum of three PPG signals were acquired from each human subject. The data were split into training and test sets randomly without any prior stratification. This experiment was useful for identifying the dominating features and some of the key model parameters in prediction of glucose with the lowest error. Statistical learning methods performed well overall and first derivative and derivative features dominated in models built with both statistical and machine learning methods.
There was an around 8.5 mg/dL reduction in SEP in the subject wise rigorous testing. The key observation here is that the models that performed well in the earlier experiment maintained good performances in subject wise testing too. PCR and PLS models built with first derivative characteristics points performed well in both types of experiments. The SEPs achieved with these models were comparatively lower than the others. Furthermore, the lowest SEP for these models were recorded for the same no. of principal components. This shows the possibility of using the first experiment for the selection of dominating features, optimum model, and key model parameters which can then be used for real tests.
In this study, PCR-and PLS-based models generally outperformed the models based on SVR and RFR, as seen in Figure 13. This could be attributed to the limited dataset used in the study. Zhang, G. et al. used PPG signals acquired from smartphones for the classification of the glucose level. Their model, with an accuracy of over 80%, can predict one of the three diabetic status-normal, borderline, and warning of a subject. However, knowing the quantitative information of the actual glucose level is more important than knowing the diabetic status. We have demonstrated that our model can solve that problem by predicting the actual glucose level with an error of less than 20 mg/L. First derivative and second derivative characteristic points were the dominating features with the machine learning models. SVR based models performed comparatively well in prediction in both experiments, whereas RFR models performed poorly in both the experiments.  Since most of the glucose levels in the dataset were in the range of 70-150 mg/dL, the efficacy of the proposed model is limited to prediction of glucose level within the mentioned range. We anticipate an improvement in the models with an increase in the amount and variety in the data in the near future. Robustness, platform independency tests, and user side app development for instant analysis will be performed in the near future.

Conclusions
Here, we have investigated a noninvasive blood glucose estimation technique using only fingertip video acquired through a commercially available smartphone camera. Commercially available smartphones have been used to acquire video data which have been converted into PPG waveform. A computational model comprised of signal processing techniques used for cleaning the data and extracting the features and regression models trained with the features for the quantitative estimation of blood glucose has also been presented. Smartphone-based data acquisition system and disease diagnosis bring significant challenges with motion artifacts and noises. This work, therefore, pays special attention to the data acquisition techniques, and the development of algorithms to clean these artifacts. Numerous experiments were conducted with different acquisition techniques to identify a suitable method and duration of data acquisition. The models have been optimized by applying four regression algorithms along with different features. A PLS-based model could predict glucose with a SEP as low as 17.71 mg/dL, which is quite comparable to the overall lowest SEP achieved by PLS, 17.02 mg/dL. The experimental results confirm the usefulness of the first development steps towards a smartphone-based novel noninvasive estimation of blood glucose. Additional experiments are planned for the testing of robustness, platform independency, and, finally, implementing it in user side app for instant measurement. We believe that the proposed noninvasive technique has importance to the community as the system uses a much more natural and comfortable method of data acquisition compared to the traditional methods, can predict glucose value with high accuracy, and has the potential to acquire and deliver the glucose level information through only a smartphone that is now available to most people.  Figure S6: (a) A sample Raw PPG signal with high-frequency noises and (b) its filtered version through Gaussian Filter, baseline issues are still evident; Figure S7: (a) A sample Raw PPG signal with high baseline variations and (b) its ALS corrected version no baseline problems, high-frequency noises are still evident, Table S1: 2nd Derivative Features Extracted from PPG signals acquired One Plus 6T, Table S2: K-Fold CV of different approaches for PCR and PLS, Table S3: K-Fold CV of different approaches for SVR and RFR.
Author Contributions: All authors listed contributed equally to the conception, design, analysis, drafting, and revision of this work. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Data collection for this project is still ongoing. In order to maintain subject confidentiality, the raw dataset couldn't be released at this time.