1. Introduction
Diabetes impacts over 10% of the global population [
1]. It exists in various types. Although each type of diabetes has its distinct primary causes, they all lead to an increase in BGLs and associated complications. These complications can result in serious health issues [
2]. In fact, the chance of having complications can be significantly reduced if the BGLs are monitored continuously and controlled in time. Therefore, monitoring BGLs continuously plays an important role in the diagnosis of the diabetes.
However, diabetes is a chronic disease. It requires continuous monitoring of BGLs. This introduces high pressure on the national medical system and the national transportation system, and reduces the national productivity if the continuous blood glucose estimation is performed in clinics or hospitals. To address this issue, continuous blood glucose estimation needs to be performed at home or at the workplace. The common continuous blood glucose estimation method is via the invasive approach. Nevertheless, the invasive blood glucose estimation approach causes pain to the subjects and has a risk of infection.
In order to address these drawbacks, various non-invasive blood glucose estimation methods have been developed. In particular, the PPGs are acquired from the fingertip based on the near infrared (NIR) spectroscopy approach [
3]. Then, the features are extracted from the PPGs and the BGLs are estimated using the machine learning models. Here, many methods are developed for extracting the features from the single-channel PPGs. They include statistical features such as the skewness and the energy of the PPGs [
4]. In addition, the logarithmic energy entropy, the Kaiser Teager energy (KTE), the spectral entropy, and the auto-regressive (AR) coefficients [
5,
6] are also employed as the features. By integrating the various types of features together, the physiological characteristics of the PPGs such as the shape, the spectral characteristics, and the energy distribution of the envelope of the PPGs can be effectively described. Hence, the relationship between the PPGs and the BGLs can be more comprehensively exploited. As a result, these features are extensively employed in the estimation of BGLs [
7]. Moreover, the features extracted from the mel frequency cepstral coefficients (MFCCs) are also employed as the features. These features provide additional insight for understanding the small variations in the time frequency characteristics of the PPGs. This is crucial for accurately detecting and monitoring the changes in the BGLs [
8]. Furthermore, the features extracted from the heart rate variability (HRV) are also utilized to evaluate the degree of the sympathetic tone. This provides comprehensive physiological information for monitoring the BGLs [
9]. Since these approaches are easy to implement, several wearable blood glucose monitoring devices have been developed in recent years using these approaches, see [
6,
10,
11]. It is worth noting that the impedance spectroscopy approach can achieve 56% of all of the estimated values in region A of the Clarke error grid [
10]. Similarly, the impedance and the multi-wavelength NIR spectroscopy approach can achieve the mean absolute relative difference (MARD) at 20% and the proportion of all of the estimated values in region A of the Clarke error grid at 60% [
11]. However, these methods are based on the single-channel PPGs. Hence, the achieved accuracy is limited and it often fails to meet the accuracy and the reliability requirements in many practical situations.
To address the above issues, multi-channel PPG acquisition devices have been developed [
4,
7,
9]. The existing methods for extracting features from multi-channel PPG signals typically process each channel independently, extracting features from individual channels separately before merging them into a single feature vector. For example, Tsai et al. collected dual-channel PPG signals from the finger and wrist, extracting time-domain and frequency-domain features independently from each channel [
4]. Similarly, Gupta et al. obtained red, green, and infrared wavelength PPG signals from the finger and extracted independent features from each wavelength channel [
7]. Wei et al. collected two PPG signals of the same wavelength from the finger and extracted heart rate variability (HRV) features independently for each [
9]. While these approaches benefit from the richer information provided by multi-channel signals compared to single-channel acquisition and are relatively simple and intuitive, they exhibit notable limitations. Specifically, because the feature extraction is performed on a per-channel basis, these methods cannot effectively exploit the coupling or interdependence among different PPG channels [
12]. This inter-channel information may carry important physiological insights that are otherwise lost in independent processing pipelines.
It is worth noting that quaternion-valued classification techniques have been increasingly employed for processing multi-channel human physiological signals to improve predictive performance [
13,
14]. Compared with traditional real-valued or complex-valued approaches, quaternion-valued representations offer a unique advantage in modeling multidimensional inter-channel correlations within a compact and unified mathematical framework. This capability is particularly beneficial for multi-channel signal fusion and joint feature extraction. For example, in [
13], EEG signals were first decomposed into four canonical brain wave components (delta, theta, alpha, and beta) using fast Fourier transform (FFT), and then modeled as quaternion-valued signals to preserve the intrinsic cross-band dependencies. Subsequently, quaternion-valued singular spectrum analysis (QSSA) was applied for feature extraction, followed by classification for sleep staging. This method achieved a classification accuracy of 97.5%, which outperformed several existing real-valued approaches, demonstrating the effectiveness of quaternion-based modeling in capturing both spatial and spectral features. In another example [
14], nine optical path-length NIR spectral signals were acquired using near-infrared spectroscopy, and three channels were randomly selected and encoded as a quaternion-valued signal to preserve inter-channel spectral relationships. Quaternion principal component analysis (QPCA) was then employed to extract features jointly from the three channels, followed by regression using support vector regression (SVR). This approach significantly outperformed traditional single-wavelength PCA-based models, achieving a correlation coefficient of 0.9854. These two applications demonstrate the effectiveness and flexibility of quaternion-valued signal processing techniques in handling multi-channel signals, particularly in scenarios where preserving inter-channel dependencies is essential for accurate analysis.
This paper proposes a non-invasive approach for performing blood glucose monitoring via a wearable device and a smartphone. Hence, it can provide a domestic solution for performing continuous blood glucose estimation and diabetes management. The wearable device consists of various PPG sensors. Since the consumable is not required, this non-invasive approach is more cost effective in the long term compared to the invasive blood glucose monitoring devices. In addition, our proposed method has a low computational complexity and can be implemented on consumer-grade hardware such as smartphones and wearable devices.
The major contributions and the novelties of this paper are as follows. First, this paper employs the quaternion-valued medians as the features for performing non-invasive blood glucose estimation. Second, this paper proposes performing the denoising via applying the PCA to the quaternion-valued medians instead of performing the denoising on the four-channel PPGs. The computer numerical simulation results demonstrate that our proposed method yields higher regression accuracy compared to the existing methods for performing non-invasive blood glucose estimation. This demonstrates the effectiveness and robustness of our proposed method. The outline of this paper is as follows.
Section 2 briefly reviews the PCA and the quaternion-valued theory.
Section 3 presents our proposed method.
Section 4 presents the computer numerical simulation results. Finally,
Section 5 draws a conclusion.
4. Computer Numerical Simulation Results
4.1. Datasets
In this paper, a four-channel smart watch is used as a device for acquiring the four-channel PPGs.
Figure 2a shows the external appearance of the smart watch. Here, two light-emitting diodes (LEDs) emit lights with wavelengths equal to 1450 nm, while the other two LEDs emit lights with wavelengths equal to 1650 nm. Each channel of the PPGs is sampled at 50 Hz. The smart watch allows the transmission of the acquired PPGs to the mobile handset. The use of four channels enables the capture of signals at the same two wavelengths (1450 nm and 1650 nm) from different sensor positions on the fingertip. This configuration increases the diversity of the data, helping to improve signal quality by reducing noise and motion artifacts. By acquiring multiple signals from the same wavelengths but different sensor placements, we enhance the robustness and accuracy of the blood glucose estimation, offering complementary information that contributes to the effectiveness of the proposed method.
In the experiment, the fingertip is placed on the surface of the NIR sensors for acquiring the PPGs.
Figure 2b shows how the device is worn on the wrist during the signal acquisition process. Here, the acquisition period lasts for 60 s. Hence, the length of each channel’s PPG is 3000 samples. The BGLs are systematically monitored at four fixed time instants at each day. In particular, the first measurement is taken at 8:30 am with the subjects in a fasted state. The second measurement is conducted at 1:00 pm, which is one hour after lunch. The third measurement is taken at 4:30 pm which is between the lunch time and the dinner time. Here, nothing has been eaten during the tea time. Finally, the fourth measurement is taken at 8:00 pm, which is one hour after dinner. In order to obtain a wider range of BGLs closer to the values acquired from the various types of subjects, including hypoglycemia subjects, healthy subjects, and hyperglycemia subjects, the diets of the subjects are artificially altered by including different amounts of carbohydrates in the diets. More precisely, the data acquisition process lasted for 12 days structured into three distinct phases with each phase lasting for 4 days and the amounts of the carbohydrates in the diets in different phases being different. Here, the BGLs acquired in the initial phase are closer to the values acquired from the hypoglycemia subjects. To achieve this goal, the subjects consume a ketogenic diet. The BGLs acquired in the second phase are closer to the values acquired from the healthy subjects. To achieve this goal, the subjects consume a standard diet. The BGLs acquired in the third phase are closer to the values acquired from the hyperglycemia subjects. To achieve this goal, the subjects consume a drink with 300 mL of cola after consuming a standard diet.
In order to evaluate the effectiveness and the robustness of our proposed method, two sets of PPGs are acquired. Here, these two different sets of PPGs are acquired from two different groups of subjects at two different seasons in a year. In particular, the first dataset includes 270 measurements. More precisely, the PPGs and the invasive BGLs were acquired from 18 subjects in May 2022. The ratio of the total number of male subjects to female subjects was three. The age range of the subjects was from 18 years old to 49 years old. The body mass index (BMI) of the subjects ranged from 19.67 to 26.33. On the other hand, the second dataset includes 490 measurements. More precisely, the PPGs and the invasive BGLs were acquired from eight subjects in December 2022. The ratio of the total number of male subjects to female subjects was 2.6. The age range of the subjects was from 20 to 49 years old. The range of the BMI of the subjects was from 17.85 to 27.76. In addition, the ratio of the total number of training feature vectors to test feature vectors was 70%.
4.2. Performance Metrics
This paper employs three different performance metrics for evaluating the effectiveness of the various non-invasive blood glucose estimation algorithms. They are the mean absolute error (MAE), the root mean squares error (RMSE), and the MARD. These performance metrics evaluate the discrepancies between the actual BGLs and the corresponding estimated values. More precisely, let
be the estimated BGL,
be the actual BGL, and
be the total number of the test data. The formulas for these performance metrics are
and
Obviously, lower values refer to more accurate non-invasive blood glucose estimations. Among these three performance metrics, the MARD is the most common one used for evaluating the performance of non-invasive blood glucose estimation algorithms.
Furthermore, the Clarke error grid [
22] is also employed for evaluating the effectiveness of the various non-invasive blood glucose estimation algorithms. The estimated BGLs are plotted against the true BGLs. Here, the grid is divided into five different regions, namely, region A to region E [
23]. Region A is the region where the estimated BGLs are within an acceptable range from the true BGLs. Since the measurements have high accuracies, the clinical decisions are unaltered and there is no clinical risk imposed to the patients. Region B is the region where the estimated BGLs are within minor deviations from the acceptable range of the true BGLs. Although the estimation errors are obvious, there is still no risk to clinical decisions. Region C is the region where the estimated BGLs are within great deviations from the acceptable range of the true BGLs. Since the estimation errors are large, there is the potential for these to lead to inappropriate clinical decisions. However, the impacts on patient safety are limited. Region D is the region where the estimated BGLs are markedly erroneous compared to the true BGLs. In this case, the clinical errors are substantial and there are considerable risks imposed on patient health. Hence, this non-invasive blood glucose estimation algorithm is unreliable for clinical use. Region E is the region where the estimated BGLs are severely inaccurate compared to the true BGLs. In this case, there is the potential to be life threatening. Hence, this non-invasive blood glucose estimation algorithm is entirely unacceptable for clinical use. By evaluating the performances of the various non-invasive blood glucose estimation algorithms using the Clarke error grid, the researchers and the clinicians can accurately assess the risks of employing non-invasive blood glucose estimation algorithms for clinical use. Hence, evaluating the effectiveness of the non-invasive blood glucose estimation algorithms via the Clarke error grid can ensure the safety of the patients.
4.3. Performances Yielded by the Random Forest Based on Applying the PCA to the Various Quaternion-Valued Medians
In this session, PCA subspace dimensions 1 through 3 are referred to as PCA-1, PCA-2, and PCA-3, respectively. The four quaternion median formulas introduced in
Section 2, specifically Formulas (7)–(10), are represented by X-M, X-K, X-O, and X-G. Each figure comprises 16 box plots, divided into four groups from left to right according to different quaternion medians. Within each group, the box plots represent the following cases, in order: (i) without PCA processing, (ii) projection into a one-dimensional PCA subspace, (iii) projection into a two-dimensional PCA subspace, and (iv) projection into a three-dimensional PCA subspace.
Figure 3 illustrates the RMSE and MAE values for blood glucose estimation based on these four quaternion-valued medians and their projections into different PCA subspaces using the first dataset. It can be observed that, for each quaternion median, projecting into a one-dimensional subspace (PCA-1) yields the lowest RMSE and MAE median values, along with relatively narrow interquartile ranges, indicating both accuracy and stability. In contrast, the worst performance—both in terms of median error and variability—is observed without PCA processing (i.e., using the original four-dimensional quaternion medians). This supports the idea that applying PCA effectively removes redundant information among the components of the quaternion representations, enhancing the predictive performance. Notably, among all medians, X-G yields the lowest RMSE and MAE values under PCA-1, suggesting its superior effectiveness as a feature extraction strategy.
In addition to the median performance, the variability of the results is also informative. Specifically, the differences between the upper and lower quartiles of RMSE and MAE are significantly smaller under PCA-1 for most quaternion medians (X-M, X-K, and X-O), indicating consistent performance across trials. However, for X-G under PCA-1, while the median is the lowest, the interquartile range is slightly wider, implying that although it provides the best average performance, its variability may be slightly higher than the others.
Similarly,
Figure 4 displays the RMSE and MAE values for the second dataset. The trends are consistent with those observed in the first dataset: the lowest errors and narrowest interquartile ranges are generally obtained under PCA-1, particularly for X-G. This consistency across datasets confirms that applying PCA to quaternion-valued medians not only improves accuracy but also enhances robustness in non-invasive blood glucose estimation.
4.4. Comparison to the State of the Art Methods
For evaluating the effectiveness of applying the PCA to the quaternion-valued medians, the state of the art methods are compared. Here, the quaternion-valued medians are first represented as the four-dimensional real-valued feature vectors. Second, these four-dimensional real-valued feature vectors are projected to the one-dimensional real-valued numbers. Third, these one-dimensional real-valued numbers are projected back to the four-dimensional real-valued feature vectors. Here, as there are four quaternion-valued medians, the dimension of the overall feature vectors is 16. Finally, these 16-dimensional overall feature vectors are employed for performing non-invasive blood glucose estimation via the random forest.
These 16-dimensional overall feature vectors are compared to the feature vectors employed in the state of the art methods for performing non-invasive blood glucose estimation. In particular, the time-domain statistical features are extracted from three PPGs acquired using three different LEDs with three different wavelengths [
4]. Moreover, the aforementioned features including the logarithmic energy entropy, the KTE, the spectral entropy, and the AR coefficients are extracted from the PPGs [
5]. Furthermore, a variety of the time-domain and the frequency-domain features such as the zero crossing rate, the auto-correlation coefficients, the power spectral density coefficients, the KTE, the spectral coefficients, the wavelet coefficients and the AR coefficients are extracted from the PPGs [
7]. In addition, the MFCCs are extracted from the PPGs [
8]. In addition, the time-domain HRV features and the frequency-domain HRV features are extracted from the PPGs [
9].
Table 1 and
Table 2 show the results yielded by these methods using SVR based on the PPGs in the first dataset and the second dataset, respectively. Similarly,
Table 3 and
Table 4 present the corresponding results obtained using the RF model. Across both datasets, the proposed method consistently achieves the lowest MARD, RMSE, and MAE values compared to existing approaches, regardless of the regression model employed. Furthermore, the results obtained under the RF model exhibit lower error metrics in the majority of experimental settings relative to those under SVR, suggesting that RF may offer enhanced predictive capability for non-invasive blood glucose estimation within the context of this study. These findings collectively demonstrate the effectiveness and robustness of the proposed method, as well as the relative advantage of adopting RF over SVR in this application.
Figure 5 illustrates the Clarke error grid analysis of the proposed method applied to the first dataset using two different regression models: SVR in subfigure (a) and RF in subfigure (b). Similarly,
Figure 6 presents the corresponding results on the second dataset. As observed from the Clarke error grids, the predicted points generated by the RF-based model are more densely distributed along the diagonal line compared to those generated by the SVR-based model, indicating a stronger correlation between the estimated and reference blood glucose values. Moreover, a higher proportion of points fall within region A when using the RF model, further demonstrating the effectiveness and robustness of the proposed method.
4.5. Discussion
In this study, we proposed a quaternion-valued framework for non-invasive blood glucose estimation based on multi-channel PPG signals. Our method integrates four-channel PPG inputs into a quaternion-valued signal structure, enabling the joint representation of inter-channel information. The results from two datasets demonstrate that the proposed method significantly outperforms conventional multi-channel approaches, achieving lower MARD, RMSE, and MAE values, along with higher proportions of estimations falling within region A of the Clarke error grid.
The observed performance improvements can be attributed to two key factors. First, the quaternion representation effectively preserves the spatial and temporal relationships among multiple PPG channels, which are often ignored when channels are processed independently. This is particularly important given the inherent correlations among synchronously acquired multi-channel signals. By modeling these inter-channel interactions within a compact four-dimensional framework, our method captures richer and more discriminative features that contribute to improved regression accuracy. Second, the application of PCA on quaternion-derived features not only reduces dimensionality and suppresses noise but also helps highlight the most informative signal components, thereby enhancing robustness and generalization across datasets.
Despite the promising results, this study has several limitations. First, the requirement for four-channel PPG acquisition may limit the applicability of the method to simpler or lower-cost hardware systems, especially in wearable or portable settings. Second, although the datasets used in our experiments are diverse and include multiple subjects, they may not fully represent the variability found in broader populations. Additionally, our evaluation was conducted under relatively controlled conditions, and the impact of real-world noise sources such as motion artifacts, skin tone variability, or sensor misalignment was not extensively tested. These factors may influence model performance in practical deployments.
Looking ahead, the proposed quaternion-based framework has the potential to be extended to other non-invasive health monitoring applications. In particular, physiological markers such as blood pressure and blood lipid levels are also influenced by multi-channel signals, including multi-wavelength PPG or multimodal biosensing data. Given that inter-channel coupling is also relevant in these domains, the quaternion representation could offer similar benefits. Future work will focus on evaluating the adaptability of this method to such tasks, validating performance under real-world conditions, and exploring its integration into embedded systems for continuous and real-time health monitoring.
5. Conclusions
This paper employs quaternion-valued medians as features for performing non-invasive blood glucose estimation. In addition, PCA is employed for suppressing the noise in the quaternion-valued medians. First, the four-channel PPGs are acquired and they are used to form the quaternion-valued PPGs. Second, the existing quaternion-valued medians are computed and they are mapped to the four-dimensional real-valued vectors. Third, PCA is used to project these four-dimensional real-valued vectors to the low-dimensional real-valued vectors. Then, the low-dimensional real-valued feature vectors are mapped back to the four-dimensional real-valued feature vectors. Finally, the random forest is used for performing the blood glucose estimation. Two datasets are employed for evaluating the effectiveness of our proposed method. The computer numerical simulation results show that our proposed method yields an MARD value, RMSE value, MAE value, and percentage of the pairs of the estimated BGLs and the real BGLs falling in region A of the Clarke error grid at 0.1498, 1.2175, 0.9586, and 77.14%, respectively, for the PPGs in the first dataset as well as 0.1369, 1.1445, 0.8572, and 81.38%, respectively, for the PPGs in the second dataset. Compared to the existing methods, our proposed method is more effective and robust.