Non-Invasive Classiﬁcation of Blood Glucose Level for Early Detection Diabetes Based on Photoplethysmography Signal

: Monitoring systems for the early detection of diabetes are essential to avoid potential expensive medical costs. Currently, only invasive monitoring methods are commercially available. These methods have signiﬁcant disadvantages as patients experience discomfort while obtaining blood samples. A non-invasive method of blood glucose level (BGL) monitoring that is painless and low-cost would address the limitations of invasive techniques. Photoplethysmography (PPG) collects a signal from a ﬁnger sensor using a photodiode, and a nearby infrared LED light. The combination of the PPG electronic circuit with artiﬁcial intelligence makes it possible to implement the classiﬁcation of BGL. However, one major constraint of deep learning is the long training phase. We try to overcome this limitation and offer a concept for classifying type 2 diabetes (T2D) using a machine learning algorithm based on PPG. We gathered 400 raw datasets of BGL measured with PPG and divided these points into two classiﬁcation levels, according to the National Institute for Clinical Excellence, namely, “normal” and “diabetes”. Based on the results for testing between the models, the ensemble bagged trees algorithm achieved the best results with an accuracy of 98%.


Introduction
Diabetes is a medical emergency that can cause various complications or even death in adults. Diabetes is often referred to as the "Silent Killer" [1,2] if it is not diagnosed in its early stages. It causes many people to ignore their diabetes and not realize its severity or presence until their blood sugar is above standard. A 2017 report by the International Diabetes Federation stated that the prevalence of diabetes would continue to increase globally [3].
The two major types of diabetes, type 1 (T1D) and type 2 (T2D) have the same symptoms. The types of diabetes are distinguished by their cause. T1D is caused by a deficiency of insulin-secreting β-cells in the pancreas and cannot be cured, but the patient's quality of life can be maintained. T2D is characterized by insulin resistance which develops throughout the patient's life. As T2D is an acquired metabolic condition, people have a higher probability of having T2D than T1D [4]. T2D can be avoided or reduced by a better understanding the risks and lifestyle changes. Effective prevention by regularly monitoring blood glucose levels (BGL) is necessary for diabetes management [5][6][7].
Currently, only invasive methods for monitoring BGL are commercially available. Blood glucose laboratory tests and glucometers are the only commercial options. Both ways are painful for the patient, as they use either a syringe or a finger prick procedure. However, the invasive methods provide a high degree of accuracy [8].
Non-invasive methods can be developed to overcome the limitations of invasive procedures [9]. A painless, easy to operate, and low-cost will improve patient compliance with routine blood glucose monitoring and early detection of diabetes [8,10]. Non-invasive methods to monitor or predict BGL have been developed using different techniques, as shown in Figure 1 [1,8,[10][11][12][13][14]. The preferred approach is based on sensors and optical techniques [11,[15][16][17]. However, these non-invasive methods are still in the early stages of development [18]. Various research studies with new techniques have emerged, requiring constant updating of the available information. One of these new methods uses photoplethysmography (PPG), as shown in Figure 1. PPG is a non-invasive optical technique that measures changes in blood volume based on variations in the intensity of light that passes through or is reflected by human organs [19][20][21]. The PPG signal is collected from a finger sensor that uses a photodiode as an optical sensor to capture light from an infrared LED, as shown in Figure 2. The Prototype of PPG signals is an easy and inexpensive optical technique measurement method that could monitor BGL in diabetes [14,17,19]. Many academics have developed prototypes with optical sensor-based data-acquisition systems. The prototypes are PPG is a non-invasive optical technique that measures changes in blood volume based on variations in the intensity of light that passes through or is reflected by human organs [19][20][21]. The PPG signal is collected from a finger sensor that uses a photodiode as an optical sensor to capture light from an infrared LED, as shown in Figure 2. Non-invasive methods can be developed to overcome the limitations of invasive procedures [9]. A painless, easy to operate, and low-cost will improve patient compliance with routine blood glucose monitoring and early detection of diabetes [8,10]. Non-invasive methods to monitor or predict BGL have been developed using different techniques, as shown in Figure 1 [1,8,[10][11][12][13][14]. The preferred approach is based on sensors and optical techniques [11,[15][16][17]. However, these non-invasive methods are still in the early stages of development [18]. Various research studies with new techniques have emerged, requiring constant updating of the available information. One of these new methods uses photoplethysmography (PPG), as shown in Figure 1. PPG is a non-invasive optical technique that measures changes in blood volume based on variations in the intensity of light that passes through or is reflected by human organs [19][20][21]. The PPG signal is collected from a finger sensor that uses a photodiode as an optical sensor to capture light from an infrared LED, as shown in Figure 2. The Prototype of PPG signals is an easy and inexpensive optical technique measurement method that could monitor BGL in diabetes [14,17,19]. Many academics have developed prototypes with optical sensor-based data-acquisition systems. The prototypes are The Prototype of PPG signals is an easy and inexpensive optical technique measurement method that could monitor BGL in diabetes [14,17,19]. Many academics have developed prototypes with optical sensor-based data-acquisition systems. The prototypes are  supported by the development of embedded system technology and relatively affordable  sensor prices. PPG signals contain a lot of information about blood [18]. The PPG signal can be explored in assessing vital signs such as blood pressure, cardiac output, respiration, and the presence of diabetes. PPG can easily capture sequential heartbeats, making it possible to measure not only heart rate (HR) but also heart rate variability (HRV) [22,23]. Derivation, HRV signal from electrocardiogram (ECG), is calculated from R-R interval. In contrast, the calculation of the HRV signal from the PPG signal uses the inter-beat interval (IBI) or pulse interval (PPI) [24], as shown in Figure 3. HRV measures the time duration varies between successive heartbeats, and this method has been widely used as an indicator of various disease conditions. The HRV strongly reflects the parasympathetic and sympathetic branches of the autonomic nervous system (ANS) [25]. ANS regulates the body's metabolic system, which plays an essential role in regulating blood glucose levels [26]. supported by the development of embedded system technology and relatively affordable sensor prices. PPG signals contain a lot of information about blood [18]. The PPG signal can be explored in assessing vital signs such as blood pressure, cardiac output, respiration, and the presence of diabetes. PPG can easily capture sequential heartbeats, making it possible to measure not only heart rate (HR) but also heart rate variability (HRV) [22,23] . Derivation, HRV signal from electrocardiogram (ECG), is calculated from R-R interval. In contrast, the calculation of the HRV signal from the PPG signal uses the inter-beat interval (IBI) or pulse interval (PPI) [24], as shown in Figure 3. HRV measures the time duration varies between successive heartbeats, and this method has been widely used as an indicator of various disease conditions. The HRV strongly reflects the parasympathetic and sympathetic branches of the autonomic nervous system (ANS) [25]. ANS regulates the body's metabolic system, which plays an essential role in regulating blood glucose levels [26]. Previous studies identified several associations between HRV and blood glucose level (BGL). These associations were only found in type 2 diabetes (T2D). Meanwhile, BGL of type 1 diabetes (T1D) does not significantly correlate with HRV parameters [27]. Parasympathetic and sympathetic activity is decreased due to the detrimental effects of glucose metabolism. It can be monitored via PPG morphology by calculating HRV [28].
There are still many things that have not been explored from PPG morphology. PPG morphology shows a different pattern between healthy and diabetic subjects of the same age in three different age ranges [28]. While healthy subjects have a slightly prolonged shape in the diastolic phase, people with diabetes usually have a bell-shaped PPG signal wave without a secondary peak. The arterial stiffness is caused by diabetes [29][30][31]. Stiffness index (SI) is a standard marker for significant arterial stiffness. SI value is calculated as subject height divided by peak-to-peak time (PTT) [32], as shown in Figure 4. Diabetic patients have significantly higher SI than non-diabetes patients [33]. A strong correlation with blood pressure, body mass index, and age was noted [33].  Previous studies identified several associations between HRV and blood glucose level (BGL). These associations were only found in type 2 diabetes (T2D). Meanwhile, BGL of type 1 diabetes (T1D) does not significantly correlate with HRV parameters [27]. Parasympathetic and sympathetic activity is decreased due to the detrimental effects of glucose metabolism. It can be monitored via PPG morphology by calculating HRV [28].
There are still many things that have not been explored from PPG morphology. PPG morphology shows a different pattern between healthy and diabetic subjects of the same age in three different age ranges [28]. While healthy subjects have a slightly prolonged shape in the diastolic phase, people with diabetes usually have a bell-shaped PPG signal wave without a secondary peak. The arterial stiffness is caused by diabetes [29][30][31]. Stiffness index (SI) is a standard marker for significant arterial stiffness. SI value is calculated as subject height divided by peak-to-peak time (PTT) [32], as shown in Figure 4. Diabetic patients have significantly higher SI than non-diabetes patients [33]. A strong correlation with blood pressure, body mass index, and age was noted [33]. supported by the development of embedded system technology and relatively affordable sensor prices. PPG signals contain a lot of information about blood [18]. The PPG signal can be explored in assessing vital signs such as blood pressure, cardiac output, respiration, and the presence of diabetes. PPG can easily capture sequential heartbeats, making it possible to measure not only heart rate (HR) but also heart rate variability (HRV) [22,23] . Derivation, HRV signal from electrocardiogram (ECG), is calculated from R-R interval. In contrast, the calculation of the HRV signal from the PPG signal uses the inter-beat interval (IBI) or pulse interval (PPI) [24], as shown in Figure 3. HRV measures the time duration varies between successive heartbeats, and this method has been widely used as an indicator of various disease conditions. The HRV strongly reflects the parasympathetic and sympathetic branches of the autonomic nervous system (ANS) [25]. ANS regulates the body's metabolic system, which plays an essential role in regulating blood glucose levels [26]. Previous studies identified several associations between HRV and blood glucose level (BGL). These associations were only found in type 2 diabetes (T2D). Meanwhile, BGL of type 1 diabetes (T1D) does not significantly correlate with HRV parameters [27]. Parasympathetic and sympathetic activity is decreased due to the detrimental effects of glucose metabolism. It can be monitored via PPG morphology by calculating HRV [28].
There are still many things that have not been explored from PPG morphology. PPG morphology shows a different pattern between healthy and diabetic subjects of the same age in three different age ranges [28]. While healthy subjects have a slightly prolonged shape in the diastolic phase, people with diabetes usually have a bell-shaped PPG signal wave without a secondary peak. The arterial stiffness is caused by diabetes [29][30][31]. Stiffness index (SI) is a standard marker for significant arterial stiffness. SI value is calculated as subject height divided by peak-to-peak time (PTT) [32], as shown in Figure 4. Diabetic patients have significantly higher SI than non-diabetes patients [33]. A strong correlation with blood pressure, body mass index, and age was noted [33].  There are several potential sources of error in BGL estimation based on PPG morphology, including: PPG is very sensitive toward being made unstable by movement, which can cause errors in measurement [34][35][36]. B.
The quality of the PPG waveform depends on the quality of the blood circulation. C.
Characteristics of the PPG waveforms vary according to blood viscosity, vascular elasticity, and fluctuations in peripheral vascular resistance [37].
The combination of PPG data collected by near-infrared sensors with artificial intelligence (AI) makes it possible to predict BGL. Machine learning (ML) or deep learning (DL) approaches are equally good at predicting BGL using PPG [14,[38][39][40][41][42][43][44]. A common challenge in implementing ML or DL is limited test samples. Currently, no data-source website contains the digital records of physiological signals, time series, and related data needed in this biomedical research.
Our proposed method is very relevant to obtaining a BGL classification in real-time. Most of the previous studies of PPG focused on estimating the BGL value [18,45,46]. The drawback of this estimation method is that clinical supervision is required. Meanwhile, the classification method helps patients to self-monitor for diabetes detection independently. Here are our main contributions: A.
With our proposed method, users can immediately know the condition of their blood BGL. We focus on a BGL classification based on the PPG signal. Therefore, in this study, two BGL levels were established: "normal" and "diabetes". B.
A particular process is not needed to establish the PPG signal's quality with our proposed method. We have used electronic filter circuits instead of filter algorithms such as wavelets. C.
Our proposed method uses ML instead of DL to achieve a faster training time. Deep learning uses extensive computing power and takes a long time to train, making it difficult to validate and repeat extensively to improve results [47].
This article proposes a non-invasive classification of BGL for early detection of diabetes based on a PPG signal by utilizing a combination of electronic circuits with a machine learning algorithm (MLA). The classifier model to be used is chosen from the results of a comparative analysis between models, such as subspace K-nearest neighbor, RUS boosted trees, ensemble bagged trees, decision trees, support vector machine, gaussian support vector machine (GSVM), bagged trees, K-nearest neighbor (KNN), fine gaussian support vector regression (FGSVR), support vector regression quadratic, and enabled boosted trees.
This study included 80 subjects and collected 400 original PPG signals from students at the Indonesian Defense University and patients at the Wocare Diabetes Clinic. Feature extraction of the PPG signal was carried out point-by-point, so the physiological data contained in the PPG signal could be explored. Each PPG data segment consists of 2100 sampling points. After all the data were collected, the datasets were divided into a training set (300 PPG signals) to train the classifier and a test set (100 PPG signals) to test the accuracy of the classifier. Datasets were added to avoid bias by duplicating signal data from each classification level to give each group the same number of training datasets (150 normal subjects and 150 diabetes subjects).
This paper is structured as follows: Section 2 defines the materials and methods; Section 3 explains the measurement and verification results; Section 4 discusses the results; Section 5 concludes the study.

Materials and Methods
The essential component in this research is the PPG signal, which was collected through an electronic circuit in a finger sensor. The resulting PPG signal cannot be directly used for BGL classification. We have used an indirect methodology by utilizing patterns and sampling points from PPG signals which are then processed using MLA. It takes PPG datasets from normal and diabetic subjects to obtain PPG signal patterns associated with BGL. The advantage of artificial intelligence is learning patterns related to the desired results, as can be seen in Figure 5. datasets from normal and diabetic subjects to obtain PPG signal patterns associated with BGL. The advantage of artificial intelligence is learning patterns related to the desired results, as can be seen in Figure 5. The proposed method presents a system that can be used for non-invasive BGL classification based on a single PPG waveform from an electronic circuit combined with a machine learning algorithm.

Data Collection
We collected the gold-standard dataset using an invasive glucometer. The PPG signal was collected using a non-invasive finger sensor. Each PPG signal is classified with the labelling "normal" or "diabetes" based on the results of the glucometer measurement. The MLA processed the signal measured by the PPG hardware to produce a patient classification. Once the PPG classification was complete, the result was compared to the glucometer classification and placed into the confusion matrix. The blue cells represent true positive (TP) or true negative (TN) signals, as shown in Figure 6. The proposed method presents a system that can be used for non-invasive BGL classification based on a single PPG waveform from an electronic circuit combined with a machine learning algorithm.

Data Collection
We collected the gold-standard dataset using an invasive glucometer. The PPG signal was collected using a non-invasive finger sensor. Each PPG signal is classified with the labelling "normal" or "diabetes" based on the results of the glucometer measurement. The MLA processed the signal measured by the PPG hardware to produce a patient classification. Once the PPG classification was complete, the result was compared to the glucometer classification and placed into the confusion matrix. The blue cells represent true positive (TP) or true negative (TN) signals, as shown in Figure 6. Processing steps: data collection, data segmentation, labelling, and results (classification).
We also collected data on the basic physiology of the individual. The dataset collected for each time point included the participant's PPG waveform signal, blood pressure, blood oxygen saturation levels, heart rate, and glucometer BGL, as shown in Table 1. The measurements for each of these parameters were taken as many as five times for each subject. The dataset includes PPG and BGL information from the subjects, divided according to BGL values based on the National Institute for Clinical Excellence (NICE) in 2019, as shown in Table 2.

Plasma Glucose Test
Normal Prediabetes Diabetes Random <200 mg/dL or <11.1 mmol/L N/A >200 mg/dL or >11.1 mmol/L Fasting <100 mg/dL or <5.5 mmol/L 100-125 mg/dL or 5.5-6.9 mmol/L >126 mg/dL or >7 mmol/L Figure 6. Schematic diagram of non-invasive classification of BGL for early detection of diabetes using a machine learning algorithm based on a photoplethysmography signal. Processing steps: data collection, data segmentation, labelling, and results (classification).
We also collected data on the basic physiology of the individual. The dataset collected for each time point included the participant's PPG waveform signal, blood pressure, blood oxygen saturation levels, heart rate, and glucometer BGL, as shown in Table 1. The measurements for each of these parameters were taken as many as five times for each subject. The dataset includes PPG and BGL information from the subjects, divided according to BGL values based on the National Institute for Clinical Excellence (NICE) in 2019, as shown in Table 2. are performed after a minimum of eight hours of fasting. In this study, we assessed BGL using a random test method. According to the NICE guidelines, we classified participants with a BGL of >200 mg/dL into the "diabetes" dataset.
The total duration of the experiment was approximately 15 min. The data collected from PPG and BGL took around five minutes to obtain. We divided the data into "signals" and "labels" groups. The signal is an array of cells consisting of a collection of PPG signals, while the label is a description of "normal" or "diabetes" based on the NICE 2019 guidelines, as shown in Figure 7. Glucometers can measure BGL using a random or fasted method. Random glucose tests can be taken at any time and do not require preparation. Fasting plasma glucose tests are performed after a minimum of eight hours of fasting. In this study, we assessed BGL using a random test method. According to the NICE guidelines, we classified participants with a BGL of >200 mg/dL into the "diabetes" dataset.
The total duration of the experiment was approximately 15 min. The data collected from PPG and BGL took around five minutes to obtain. We divided the data into "signals" and "labels" groups. The signal is an array of cells consisting of a collection of PPG signals, while the label is a description of "normal" or "diabetes" based on the NICE 2019 guidelines, as shown in Figure 7. We selected the signals based on the waveform to anticipate a low-quality PPG signal. Each PPG signal segment was manually evaluated using either an unfit, acceptable, or excellent PPG waveform classification, as described in Figure 8 [34]. PPG waves in the excellent category had a clear beat with a diastolic notch. The acceptable category had a clear beat without a diastolic notch, and the unfit category contained many noisy beats without a diastolic notch. This method is inspired by the skewness signal quality index (SSQI), which divides the degree of asymmetry of the distribution around the mean. Positive skewness is an asymmetric distribution towards a positive value. Negative skewness indicates an asymmetric distribution towards a negative value [34,48].  We selected the signals based on the waveform to anticipate a low-quality PPG signal. Each PPG signal segment was manually evaluated using either an unfit, acceptable, or excellent PPG waveform classification, as described in Figure 8 [34]. PPG waves in the excellent category had a clear beat with a diastolic notch. The acceptable category had a clear beat without a diastolic notch, and the unfit category contained many noisy beats without a diastolic notch. This method is inspired by the skewness signal quality index (SSQI), which divides the degree of asymmetry of the distribution around the mean. Positive skewness is an asymmetric distribution towards a positive value. Negative skewness indicates an asymmetric distribution towards a negative value [34,48]. Glucometers can measure BGL using a random or fasted method. Random glucose tests can be taken at any time and do not require preparation. Fasting plasma glucose tests are performed after a minimum of eight hours of fasting. In this study, we assessed BGL using a random test method. According to the NICE guidelines, we classified participants with a BGL of >200 mg/dL into the "diabetes" dataset.
The total duration of the experiment was approximately 15 min. The data collected from PPG and BGL took around five minutes to obtain. We divided the data into "signals" and "labels" groups. The signal is an array of cells consisting of a collection of PPG signals, while the label is a description of "normal" or "diabetes" based on the NICE 2019 guidelines, as shown in Figure 7. We selected the signals based on the waveform to anticipate a low-quality PPG signal. Each PPG signal segment was manually evaluated using either an unfit, acceptable, or excellent PPG waveform classification, as described in Figure 8 [34]. PPG waves in the excellent category had a clear beat with a diastolic notch. The acceptable category had a clear beat without a diastolic notch, and the unfit category contained many noisy beats without a diastolic notch. This method is inspired by the skewness signal quality index (SSQI), which divides the degree of asymmetry of the distribution around the mean. Positive skewness is an asymmetric distribution towards a positive value. Negative skewness indicates an asymmetric distribution towards a negative value [34,48]. Figure 8. The PPG signal selection is made based on its waveform [34]. Figure 8. The PPG signal selection is made based on its waveform [34].
The PPG signal generated from the electronic circuit was processed and saved in a comma-separated values (CSV) format using processing application software. Each PPG data segment consisted of 2100 sampling points, which required 10 s of data access. We used a frequency of 190 Hz during signal acquisition, as shown in Figure 9. Processing software allowed us to view each subject's PPG waveform in real-time. We selected the PPG waveforms manually and analyzed them one by one so that only PPG waveforms in the excellent or acceptable category were stored. This process was developed to reduce PPG segments with high motion artefacts.
The PPG signal generated from the electronic circuit was processed and saved in a comma-separated values (CSV) format using processing application software. Each PPG data segment consisted of 2100 sampling points, which required 10 s of data access. We used a frequency of 190 Hz during signal acquisition, as shown in Figure 9. Processing software allowed us to view each subject's PPG waveform in real-time. We selected the PPG waveforms manually and analyzed them one by one so that only PPG waveforms in the excellent or acceptable category were stored. This process was developed to reduce PPG segments with high motion artefacts.

Electronic Circuits
In this study, we used the Easy Pulse design by Embedded Lab, an open-source pulse sensor, to measure cardiovascular pulse waves by detecting changes in blood volume in blood vessels close to the skin, which generate PPG signals [49]. The hardware configuration includes the finger sensor, the filtering circuit, the amplifier circuit, and the analogto-digital converter circuit. The finger sensor uses HRM-2511E which consist of near-infrared and photodiode [49]. This configuration will detect changes in the blood volume during the systolic and diastolic phases in the form of a PPG waveform. The PPG signal from the photodetector is still fragile and noisy, so it requires a signal amplifier circuit and a filter to suppress noise. Technically there are two stages of signal processing. In Stage 1, blocking the DC component of the PPG signal through a passive high-pass filter (HPF) is a combination of a resistor (68 kOhm) and a capacitor (4.7 uF), resulting in a cut-off frequency of 0.5 Hz. The PPG signal from the HPF is still noisy, so it still needs to be processed further through an active low-pass filter (LPF) based on an operational amplifier (op-amp) in a non-inverting mode that has a gain of 48 and a cut-off frequency of 3.4 Hz. At the negative input, the op-amp uses a reference voltage (Vref) of 2.0 V coming from the Zener diode so that the PPG signal can reach full swing at the op-amp output. The active LPF goes to Stage 2, which is a replica of the Stage I circuit. The PPG analog signal from Stage 2 is converted into a digital signal by an analog-to-digital converter circuit as raw data input in ML models, as shown in Figure 10.

Electronic Circuits
In this study, we used the Easy Pulse design by Embedded Lab, an open-source pulse sensor, to measure cardiovascular pulse waves by detecting changes in blood volume in blood vessels close to the skin, which generate PPG signals [49]. The hardware configuration includes the finger sensor, the filtering circuit, the amplifier circuit, and the analog-to-digital converter circuit. The finger sensor uses HRM-2511E which consist of near-infrared and photodiode [49]. This configuration will detect changes in the blood volume during the systolic and diastolic phases in the form of a PPG waveform. The PPG signal from the photodetector is still fragile and noisy, so it requires a signal amplifier circuit and a filter to suppress noise. Technically there are two stages of signal processing. In Stage 1, blocking the DC component of the PPG signal through a passive high-pass filter (HPF) is a combination of a resistor (68 kOhm) and a capacitor (4.7 uF), resulting in a cut-off frequency of 0.5 Hz. The PPG signal from the HPF is still noisy, so it still needs to be processed further through an active low-pass filter (LPF) based on an operational amplifier (op-amp) in a non-inverting mode that has a gain of 48 and a cut-off frequency of 3.4 Hz. At the negative input, the op-amp uses a reference voltage (V ref ) of 2.0 V coming from the Zener diode so that the PPG signal can reach full swing at the op-amp output. The active LPF goes to Stage 2, which is a replica of the Stage I circuit. The PPG analog signal from Stage 2 is converted into a digital signal by an analog-to-digital converter circuit as raw data input in ML models, as shown in Figure 10.

Classifier
We used MATLAB (version R2021b) as a platform to choose the best machine learning classifier. The platform enables the algorithm to perform BGL classification based on PPG signals with an optimal level of accuracy. All models were tested on the same dataset, as shown in Figure 11.

Classifier
We used MATLAB (version R2021b) as a platform to choose the best machine learning classifier. The platform enables the algorithm to perform BGL classification based on PPG signals with an optimal level of accuracy. All models were tested on the same dataset, as shown in Figure 11.

Classifier
We used MATLAB (version R2021b) as a platform to choose the best machine learning classifier. The platform enables the algorithm to perform BGL classification based on PPG signals with an optimal level of accuracy. All models were tested on the same dataset, as shown in Figure 11.

Ensemble Classification
The ensemble is a method that combines many single classifiers so that the predicted results of each classifier are integrated into the final prediction through a majority voting process. The ensemble method was developed to improve classification prediction accuracy.
The ensemble technique that relies on bagging and boosting variations can provide predictions with excellent accuracy. Bagging works by making ensemble composing models so that various possibilities can be accommodated to the maximum while boosting works iteratively so that cases that are not easily predictable are no longer a problem. Based on what is currently developing, the ensemble approach in the predictive algorithm is the right choice for those who want to obtain satisfactory results that are very easy to work with.
The number of split parameters in the ensemble algorithm determines the accuracy of the results [50]. The model will perform better when there is an increase in splits. However, increasing the number of splits that are too large creates the possibility of overfitting. In addition, the number of parameters studied further increases the accuracy of the model, but the process becomes very time-consuming. This study selected an ensemble bagged trees model for non-invasive BGL classification based on a single PPG waveform. The ensemble bagged trees algorithm (EBTA) is a method that can improve the results of machine learning classification algorithms by combining prediction classifications from several models. It is used to overcome instability in complex models with relatively small data sets. Bagging is best suited for problems with relatively small training datasets. Bagging adopts bootstrap distribution to generate different base learners, to obtain subset data. Bagging also adopts a base learner output-aggregation strategy, namely the voting method for the classification case and averaging for the regression case [50], as shown in Figure 12.

Ensemble Classification
The ensemble is a method that combines many single classifiers so that the predicted results of each classifier are integrated into the final prediction through a majority voting process. The ensemble method was developed to improve classification prediction accuracy. The ensemble technique that relies on bagging and boosting variations can provide predictions with excellent accuracy. Bagging works by making ensemble composing models so that various possibilities can be accommodated to the maximum while boosting works iteratively so that cases that are not easily predictable are no longer a problem. Based on what is currently developing, the ensemble approach in the predictive algorithm is the right choice for those who want to obtain satisfactory results that are very easy to work with.
The number of split parameters in the ensemble algorithm determines the accuracy of the results [50]. The model will perform better when there is an increase in splits. However, increasing the number of splits that are too large creates the possibility of overfitting. In addition, the number of parameters studied further increases the accuracy of the model, but the process becomes very time-consuming. This study selected an ensemble bagged trees model for non-invasive BGL classification based on a single PPG waveform. The ensemble bagged trees algorithm (EBTA) is a method that can improve the results of machine learning classification algorithms by combining prediction classifications from several models. It is used to overcome instability in complex models with relatively small data sets. Bagging is best suited for problems with relatively small training datasets. Bagging adopts bootstrap distribution to generate different base learners, to obtain subset data. Bagging also adopts a base learner output-aggregation strategy, namely the voting method for the classification case and averaging for the regression case [50], as shown in Figure 12.

Results
When a finger is inserted into the finger sensor, the raw PPG waveforms can be observed at Test Point 1 (TP1) using an oscilloscope. The voltage generated from the finger sensor is very low and noisy, only about 80 mV, as described in Figure 13. The raw PPG signal from TP1 enters the second stage. The raw signal is then filtered to reduce noise in the second stage. The first stage output signal (TP2) still contains a significant DC component and small pulsatile amplitude. The output from the HPF goes to an active LPF where the op-amp amplifies the PPG signal so that the rated PPG signal voltage is 0.8 V. The noise in the signal can be eliminated in this phase, which is shown in Figure 14. The PPG signal on TP2 coming from the second stage is amplified by an op-amp circuit with amplification between the reference voltage ranges (V reference = 2.0 V), as shown in Figure  15.

Results
When a finger is inserted into the finger sensor, the raw PPG waveforms can be observed at Test Point 1 (TP1) using an oscilloscope. The voltage generated from the finger sensor is very low and noisy, only about 80 mV, as described in Figure 13. The raw PPG signal from TP1 enters the second stage. The raw signal is then filtered to reduce noise in the second stage. The first stage output signal (TP2) still contains a significant DC component and small pulsatile amplitude. The output from the HPF goes to an active LPF where the op-amp amplifies the PPG signal so that the rated PPG signal voltage is 0.8 V. The noise in the signal can be eliminated in this phase, which is shown in Figure 14. The PPG signal on TP2 coming from the second stage is amplified by an op-amp circuit with amplification between the reference voltage ranges (V reference = 2.0 V), as shown in Figure 15.
The PPG signal from the second stage circuit is still analog and must be converted to digital in the analog-to-digital converter (ADC) circuit. The resulting digital signal is a series of digital pulses synchronized with the heart rate monitor on the digital output pin, as shown in Figure 16.   The PPG signal from the second stage circuit is still analog and must be converted to digital in the analog-to-digital converter (ADC) circuit. The resulting digital signal is a series of digital pulses synchronized with the heart rate monitor on the digital output pin, as shown in Figure 17.   The PPG signal from the second stage circuit is still analog and must be converted to digital in the analog-to-digital converter (ADC) circuit. The resulting digital signal is a series of digital pulses synchronized with the heart rate monitor on the digital output pin, as shown in Figure 17.   The PPG signal from the second stage circuit is still analog and must be converted to digital in the analog-to-digital converter (ADC) circuit. The resulting digital signal is a series of digital pulses synchronized with the heart rate monitor on the digital output pin, as shown in Figure 17.  We compared the testing performance based on the accuracy values. The results show that the EBTA had a better accuracy score than any other classification method, as shown in Table 3. A confusion matrix is used to visualize the performance of the classifiers for the data sets where the true value is known. The confusion matrix from the testing process of each model is shown in Figure 17. We compared the testing performance based on the accuracy values. The results show that the EBTA had a better accuracy score than any other classification method, as shown in Table 3. A confusion matrix is used to visualize the performance of the classifiers for the data sets where the true value is known. The confusion matrix from the testing process of each model is shown in Figure 17. We performed a comparative study between our method and the results of previous studies. Table 4 presents a performance comparison to prior studies.  We performed a comparative study between our method and the results of previous studies. Table 4 presents a performance comparison to prior studies.

Discussion
The quality of the finger sensor needs to be considered because it is related to the PPG principle, which is a non-invasive method that functions to measure variations in blood volume in tissues using a light source and detector. The signal quality will significantly affect the final classification results of ML in general. This PPG signal quality assurance is the main reason for using an electronic filter circuit instead of filtering algorithms such as wavelet transform. As shown in Figure 18, an analysis of PPG signal quality demonstrates a difference in PPG signal quality between electronic circuit filtering and wavelet transform.
The quality of the PPG signal from the filtering of the electronic circuit is better, marked by the visible diastolic notch. The feature points on each PPG signal contain essential information related to the accuracy of an ML classification process results. If filtered through a wavelet transformation, it is possible to remove important information from the PPG signal during the derivation process.
The quality of the finger sensor needs to be considered because it is related to the PPG principle, which is a non-invasive method that functions to measure variations in blood volume in tissues using a light source and detector. The signal quality will significantly affect the final classification results of ML in general. This PPG signal quality assurance is the main reason for using an electronic filter circuit instead of filtering algorithms such as wavelet transform. As shown in Figure 18, an analysis of PPG signal quality demonstrates a difference in PPG signal quality between electronic circuit filtering and wavelet transform. The quality of the PPG signal from the filtering of the electronic circuit is better, marked by the visible diastolic notch. The feature points on each PPG signal contain essential information related to the accuracy of an ML classification process results. If filtered through a wavelet transformation, it is possible to remove important information from the PPG signal during the derivation process. Data distribution is a function that shows all the data values and how often these values occur. The data distribution includes parameters that need to be considered to ensure that there are no data outliers to provide the optimal level of accuracy of the results, as shown in Figure 19. Data distribution is a function that shows all the data values and how often these values occur. The data distribution includes parameters that need to be considered to ensure that there are no data outliers to provide the optimal level of accuracy of the results, as shown in Figure 19. After getting a high-quality, raw PPG signal, the next step is to choose an MLA that best fits the non-linear character of the PPG signal. Algorithm selection was based on training time, total misclassification cost, and accuracy of results. The top three were obtained after conducting several tests between several MLAs with the same dataset. The highest-ranked algorithm was the EBTA with 98% accuracy, a training time of 8 s, and a total misclassification cost of 2. The second-best algorithm was weighted KNN with 93% accuracy, a training time of 8.87 s, and a total misclassification cost of 7. The third algorithm was ensemble subspace KNN with 93% accuracy, a training time of 15 s, and a total misclassification cost of 7. While the algorithm Coarse Gaussian SVM had the fastest training time of 5 s, the accuracy of the results was only 91%, as shown in Table 3. The total misclassification cost is an essential parameter for evaluating the classification performance of unbalanced data sets [51]. The cost of misclassification is a penalty for errors in the classification process. The cost acquisition caused by different errors may be further After getting a high-quality, raw PPG signal, the next step is to choose an MLA that best fits the non-linear character of the PPG signal. Algorithm selection was based on training time, total misclassification cost, and accuracy of results. The top three were obtained after conducting several tests between several MLAs with the same dataset. The highest-ranked algorithm was the EBTA with 98% accuracy, a training time of 8 s, and a total misclassification cost of 2. The second-best algorithm was weighted KNN with 93% accuracy, a training time of 8.87 s, and a total misclassification cost of 7. The third algorithm was ensemble subspace KNN with 93% accuracy, a training time of 15 s, and a total misclassification cost of 7. While the algorithm Coarse Gaussian SVM had the fastest training time of 5 s, the accuracy of the results was only 91%, as shown in Table 3.
The total misclassification cost is an essential parameter for evaluating the classification performance of unbalanced data sets [51]. The cost of misclassification is a penalty for errors in the classification process. The cost acquisition caused by different errors may be further in binary classification problems. The smaller the total misclassification cost obtained, the better the classifier's performance. We have shown that the EBTA has a total misclassification cost of 2, so it is the best algorithm compared to other machine learning algorithms. We chose the EBTA to classify BGL using non-invasive PPG based on these results.
We conducted a comparative study between our method and the results of previous studies. The results of the comparative study show that the model we propose achieves a better accuracy score and overall outcome. In general, conclusions from quantitative research in previous studies can be different even though they use the same classifier, as shown in Table 4. We focus on comparing the results with two previous researchers, Zhang et al. [41] and Nirala et al. [42]. Nirala et al. proposed the Gaussian SVM method with a final accuracy value of 97.8%, which had the highest accuracy before this study. The SVM algorithm is trained to label the input data into two categories divided by the most expansive possible area between categories. Due to the large area between these category labels, SVM is also known as a large margin classifier. However, the SVM model is a linear classifier and cannot map non-linear functions without kernel usage, while the PPG signal is a non-linear function. When the EBTA produces a decision, it comes from several different models to reach a single conclusion. Technically, the classification process will be simplified by implementing voting. The EBTA uses a configuration of trees to be combined when voting on each test case. One class is considered valid if it receives more votes than the other. Voting results are more consistent if more votes are used. The EBTA produces composite models which usually perform better than single models such as SVM.
Zhang et al. research [41] is a meaningful comparison because it uses the same ensemble algorithm as our proposed algorithm. However, the results produced by Zhang et al. have lower accuracy than our proposed model. The difference in accuracy values occurs due to differences in the quality of the PPG raw signal and the number of sample points used. Each PPG signal has been extracted into 2100 features, and feature extraction is performed point-by-point so that the physiological data in the PPG signal can be explored optimally.

Conclusions
ML is the best solution to replace the conventional extracting PPG morphological features related to BGL. Our proposed method combines hardware with software to create a reliable system for BGL classification based on PPG. This classification method has advantages over other BGL estimation methods. Users can immediately find out the condition of their BGL to ensure early detection of diabetes using our proposed method. This method can speed up the treatment process and reduce the risk of death. We chose an electronic component-based PPG signal filtering process instead of an algorithm to preserve every information point contained in the PPG signal. The selection of electronic circuit filtering is an advantage of the proposed method compared to previous studies. By maintaining the quality of the PPG signal from the beginning, a more optimal accuracy value of ML classification results will be achieved. Our proposed method also does not require the extraction of the morphological features of the PPG signal; therefore, this method can be easily applied to many situations. Our proposed method (EBTA) achieves higher accuracy than subspace K-nearest neighbor, RUS boosted trees, ensemble bagged trees, decision trees, support vector machine, Gaussian support vector machine (GSVM), bagged trees, K-nearest neighbor (KNN), fine gaussian support vector regression (FGSVR), and support vector regression quadratic. In addition, increased sample sizes could further improve the performance of BGL classification based on PPG signals. We will use an embedded system to unite the electronic circuit section with ML algorithms to realize handheld equipment in future research. Institutional Review Board Statement: Ethical review and approval were neglected for this study because the study data were obtained from the diabetes clinic medical records.