Overcoming Individual Discrepancies, a Learning Model for Non-Invasive Blood Glucose Measurement

: Non-invasive Glucose Measurement (NGM) technology makes great sense for the blood glucose management of patients with hyperglycemia or hypoglycemia. Individual Discrepancies (IDs), e.g., skin thickness and color, not only block the development of NGM, but also become the reason why NGM cannot be widely used. To solve this problem, our solution is designing an individual customized NGM model that can measure these discrepancies through multi-wavelength and tune parameters for glucose estimating. In this paper, an NGM prototype is designed, and a learning model for glucose estimating with automatically parameters tuning based on Independent Component Analysis (ICA) and Random Forest (RF) is presented. The clinic trial proves that the correlation coefﬁcient between estimation and reference Blood Glucose Concentration (BGC) can reach 0.5 after merely 10 times of learning, and rise to 0.8 after about 60 times of learning.


Introduction
Diabetes mellitus is a metabolic disorder of multiple aetiologies characterized by chronic hyperglycaemia with disturbances of carbohydrate, fat, and protein metabolism resulting from defects in insulin secretion, insulin action, or both [1].The original damage caused by diabetes mellitus is due to all kinds of diabetic complications, such as diabetic nephropathy, which is one of the most harmful.The World Health Organization (WHO) estimates that more than 180 million people worldwide have diabetes and this number is likely to be more than 350 million by 2030 [2].
Blood Glucose Measurement technology is necessary for people whose blood glucose is unbalanced to control their glucose level.According to recent researches, continuous glucose monitors (CGM) can greatly improve control of blood glucose [3].James C. Boyd has pointed out that a blood glucose measurement with an error of less than 10% every 5 min can significantly reduce the harm caused by hyperglycemia or hypoglycemia [4].However, most of the blood glucose meters currently available on the market are invasive, which needs to puncture the fingers to collect blood as a measurement sample.This measurement method will cause pain to the patient and is particularly not convenient for young children, which can lead to inefficient self-monitoring and is not suitable for continuous glucose monitoring.
In order to reduce the trauma to patients during blood glucose measurement, and increase the frequency of glucose monitoring, some medical device companies began to study mini-invasive or non-invasive blood glucose meters.In recent years, some devices have even obtained Communate Europpene (CE) or U.S. Food and Drug Administration (FDA) certification.The most famous product is Abbott FreeStyle™ Libre [5,6], a mini-invasive blood glucose meter attached to the patient's arm, which was approved by FDA in September 2017.Although FreeStyle™ can provide 24-h continuous blood glucose monitoring, it still needs to use a micro-needle to puncture the skin, and long-term wear can easily lead to wound infection.Not only that, FreeStyle™ needs to be replaced every 14 days, making the measurement cost too large.In addition, this mini-invasive glucose meter measures the glucose concentrations of the subcutaneous interstitial fluid, instead of blood.Numerous studies have pointed out that there is a time lag between the glucose concentration in the blood and that in the interstitial fluid [7,8], which leads to a lower accuracy in blood glucose fluctuations or hypoglycemia, making it difficult to promptly alert.For these reasons, the research of non-invasive glucose measurement (NGM) technology has become one of the most popular topic in the medical treatment and the sensor field.
The history of NGM research has been more than 30 years, but the methods of NGM are still not systemic.Fortunately, many technologies have been applied to make the measurement more accurate.The most common ways of NGM are optical, thermodynamic, and other chemical or physical methods [9].Although there is still no method whose accuracy can satisfy the requirements of clinic application, progresses on some important aspects have been made, e.g., the relation between glucose concentration and Raman spectroscopy acquired from physical tissue has been proved in [10].The major techniques of optical methods are based on spectrum analysis of subcutaneous structures in the near or mid infrared (0.8-2.5 µm), in which the absorbance of the glucose bonds (C − H, O − H) is strong [11].With the development of artificial intelligence, there are many researches that combines physical and physiological data with machine learning algorithm to predict glucose levels [12].In Table 1 and Section 2, we will detail and analyze the mini-invasive and non-invasive blood glucose meters currently available on the market.Overcoming Individual Discrepancies (IDs) is one of the key technical issues in NGM by spectrum analysis methods.In the light propagation path in skin tissue, there are many individual factors that can affect the final spectrum and glucose measurement results, e.g., the thickness and color of skin tissue, cuticle, epidermis, dermis and subcutaneous tissue, muscle, bone, etc. [13].In non-invasive optical measurement of other blood components, e.g., blood oxygen, hemoglobin, etc., some methods have been proposed which are able to eliminate the interference of IDs.The dynamic spectroscopy method is able to extract the information of blood's components, removing background and noise at the same time [14].Double-sampling is used to improve signal-to-noise ratio of dynamic spectroscopy [15].In the field of NGM, different measurement methods need different ways to overcome IDs, which needs further research.
In this paper, a learning NGM model that can overcome IDs, to a certain extent, is presented.This model is based on human earlobe light absorbances with five different wavelengths and a glucose estimating algorithm with automatically parameters tuning for individuals.The earlobe was chosen due to the absence of bone tissues, and also because of its relatively small thickness [16].We choose three wavelengths (730, 850, and 930 nm) in the near infrared wavelength range whose absorbance mainly depends on the amount of glucose in blood to measure blood glucose level.In order to overcome the IDs, green (490 nm) and red (660 nm) light is used to estimate the tissue thickness and blood volume separately to compensate the measurements [12].Based on the linear relation between absorbances of the five lights and blood components, we use Independent Component Analysis (ICA) [17] to extract independent components from transmittances data, and use them as new features data.Glucose concentration is obtained with Random Forest (RF) [18], and test its performance with a reference invasive glucose meter in terms of measurement accuracy.ICA is a method for separating complex signals to independent signals by matrix decomposition which is frequently used in signal processing [17].RF is a machine learning model, which can be used for black-box modeling in the case of mechanism is not clear [18].Prototype test result shows the effectiveness of this approach by Clarke Error Grid Analysis (EGA) [19].
The rest of this paper is structured as follows: In Section 2, some products of blood glucose meters are discussed.Section 3 focuses on analyzing the measurement theory that supports the NGM through absorbance spectrum.Section 4 describes the circuit of the prototype for getting light absorbance data.Section 5 presents the glucose estimation model based on ICA and RF.Section 6 shows the results of the measurement accuracy with EGA, and discusses the convergence speed of our proposal.Finally, this paper is concluded in Section 7.

Related Works
In this section, we will list the mini-invasive and non-invasive blood glucose meters currently certified by CE or FDA, and introduce them in detail, including their measurement principles and advantages and disadvantages.An overall comparison is shown in Table 1.It is worth mentioning that since FreeStyle™ Libre, Symphony™ , and G5/G6™ measures the interstitial glucose concentrations rather than the blood glucose concentration, some scholars do not consider these devices to be blood glucose meters.In this paper, considering that the original intention of these devices is to monitor the blood glucose levels by monitoring the interstitial glucose concentrations, we classify them as mini-invasive blood glucose meters, but the readers should know that they are not true blood glucose meters.

Mini-Invasive
FreeStyle™ Libre: This is a wearable continuous glucose monitoring device from Abbott, and consists of a coin-sized disposable circular sensor that is attached to the upper arm by a miniature needle of 5.0 mm long and 0.4 mm wide and a small adhesive tape.The user can obtain his/her blood glucose level through a handheld device.It should be noted that this product measures the glucose concentration of the interstitial fluid, which exists a time lag compared to blood glucose level, which means that a finger prick test using a blood glucose meter is required during times of rapidly changing glucose levels when interstitial fluid glucose levels may not accurately reflect blood glucose levels, or if hypoglycemia or impending hypoglycemia is reported but the symptoms do not match the system readings [6].In addition, FreeStyle™ Libre needs to be replaced every 14 days, to avoid wound infection.
Symphony™: It was developed by Echo Therapeutics, Inc., as a mini-invasive continuous glucose monitoring device.A special skin preparation device is used to permeate the skin before placing the sensor.The device abrades the skin, removing about 0.015 mm of the outer layer of skin using adaptive micro-abrasion technology.This process takes 10-20 s, and removal of the outer layer of skin for allowing the measuring of a number of physiological properties, including interstitial glucose levels.All the measured values are transmitted wirelessly to a remote monitor that is equipped with alarm alerts if the glucose level is outside the normal range.Like FreeStyle™ Libre, since Symphony™ measures the interstitial glucose levels, it also does not accurately reflect blood glucose levels.
G5/G6™: Dexcom G5/G6™ is a patch device, and composed of three parts: A sensor that measures glucose in the fluid under the skin, a processor that is embedded on the sensor and transmits the data to the receiver every five minutes, and a receiver that displays the blood glucose level to the user [20].The measurement principle of this device is similar to FreeStyle™ and Symphony™ , and is indicated by the FDA for use as both a standalone CGM and for integration into automated insulin dosing (AID) systems [21].
GlucoWatch™: It is a watch-type glucose meter that contacts the human skin through an electrode, which generates a micro current to the skin.The charged ions and glucose in the subcutaneous interstitial fluid can migrate to the skin surface under the action of the electric field, and the glucose concentration is measured by glucose oxidase.This device has high requirements on the physiological condition of human body.Factors, e.g., skin sweating, environmental temperature, electrostatic interference, etc. will affect the measurement results.Due to its low accuracy, GlucoWatch™ has been revoked by the FDA and is required to recall products that have already been sold.

Non-Invasive
Google contact lens: This is a smart contact lens project developed by Google on 16 January 2014.This project aims to assist people with diabetes by constantly measuring the glucose levels in their tears.Tears will flow into the monitor through a small holes in the lens, which containing a sensor that measures the glucose level in the tears.The users obtain their glucose levels by observing the change of the color of the lens.Unfortunately, Google announced the termination of the project on 16 November 2018, due to the large measurement error and difficult sample collection [22].
TensorTip™ CoG: It is a learning non-invasive glucose meter, developed by Cnoga Medical Ltd, that requires more than 100 invasive and more than 50 non-invasive data calibrations before use.The basis for measuring blood glucose is the absorption spectrum of the finger.
HG1-c: MediSensors is San Jose, California, USA based company that developed compact continuous NGM device i.e., HG1 − c, which is based on Raman spectroscopy [23].
GlucoTrack TM : It consists of two parts, the handheld monitor and the ear-clip sensor, and integrates three measurement techniques: Ultrasound, electricity, and thermal.Ultrasound technique is used to measure the velocity of sound waves passing through the earlobe, and the velocity is affected by the concentration of blood glucose in the capillaries of the earlobe; electrical technique is used for measuring the change in conductivity of the tissue; and the heat transfer features of the tissue is obtained by the thermal technique.
SugarTrac TM : This device is based on infrared spectrum.It is inserted in the ear canal, and emitting different wavelengths of light to the eardrum, analyzing the reflected signals to calculate glucose levels.

Theoretical Analysis
In the above, some blood glucose meters are analyzed, from which we can find that some mini-invasive glucose meters, despite obtaining the FDA certification, still have many problems.As for non-invasive blood glucose meters (NGM), none of them received FDA certification.One of the key reasons is that individual discrepancies (IDs) affect the measurement signal, making accuracy low.Next, we will introduce our research work on NGM technology, i.e., learning model for overcoming IDs.In this section, an earlobe model for blood glucose concentration estimation is established.Based on this model, the wavelengths selection rules are introduced.

Earlobe Model
The measuring principle of BGC in this paper is based on the Beer-Lambert law [24], which relates the light attenuation to the properties of the material through.The Beer-Lambert law can be described in where a(λ) is the absorbance of light at wavelength of λ. ε i (λ) is the molar attenuation coefficient for λ wavelength light of the attenuating species i in the material sample.c i is the amount concentration of the attenuating species i in the material sample.l is the path length of the beam of light through the material sample.I 0 and I represent the intensity of incident and output light.
Taking the earlobe as a material sample, Equation ( 3) can be simplified as where l is the equivalent thickness of the earlobe, which varies from person to person.There are many attenuating species in the earlobe, e.g., glucose, water, fat, hemoglobin, etc.These species can be represented by symbol i in Equation (3), and let 1 represent glucose.It is remarkable that the concentrations of these species are dissimilar from everyone and change with time.These individual discrepancies should be regarded as unknown variables.According to linearity algebra, multi unknowns can not be solved by one equation.To solve the c i (i = 1, 2, ... N), we need at least N equations, which means that absorbance of N different wavelengths of light have to be measured to set up a system of linear equation as which can be established in the form of matrix where A and C are absorbance vector [a(λ i )] N×1 and concentrations vector In theory, as long as absorbance vector A has been measured, the blood glucose concentration c 1 can be worked out by Cramer's Rule as where Θ 1 is the matrix formed by replacing the 1-th column of Θ by the absorbance vector A.
There is still an essential question that is how to determine the coefficient matrix Θ which varies from person to person.We will solve this question by ICA and RF in Section 5.

Wavelength Selection
The choice of wavelength is very important for NGM, and inappropriate wavelength combinations will make Equation ( 5) an Ill-Conditioned Equation (ICE) [25] whose solution is extremely unstable.The ICE problem will cause the solution of Equation ( 5) an huge error under a tiny interference in vector A or matrix Θ.Unfortunately, this interference often occurs in NGM, reducing the robustness of the measurement.
In order to improve the accuracy and robustness of NGM, ICE should be avoided as much as possible.The solution is making the k(Θ) in Equation ( 7) as large as possible [25].
k(Θ) will be large enough, if the wavelength selection follows the following principle: ε j (λ i ) in Θ should be large as possible if i = j, otherwise it should be small.In other word, wavelength selection should ensure that the absorbance of components i is large at the wavelength of λ i and should be small at other other wavelengths.
Figure 1 shows the spectrum of glucose from 500 nm to 1000 nm, in which there is an absorption peak at 930 nm.In order to eliminate the interference of other components, the spectrum is obtained from aqueous glucose solution and the water background is subtracted.Other components of the skin tissue have a lower absorbance in the near-infrared region (780-2526 nm) [11].Therefore, the wavelength λ 0 used to measure glucose is selected as 930 nm.The change of blood volume, oxyhemoglobin (Oxy-Hb) and deoxyhemoglobin (Deoxy-Hb) can be measured by dual-wavelength of 850 nm and 660 nm [26].The attenuation degree of water is greater and decreased at the wavelength of 730 nm and 490 nm respectively.Meanwhile, fat is the main component part that effects the tissue light attenuation at 490 nm, which can be used to measure the thickness of earlobe.The five different wavelengths (490 nm, 660 nm, 730 nm, 850 nm, and 930 nm) are selected for blood glucose estimation.

Circuit of Blood Glucose Sensor
Earlight is the blood glucose sensor used in this paper for collecting absorbance data of different wavelengths.In this section, we focus on the circuit design of the Earlight, which is composed of three parts, i.e., light emitter, light receptor, and controller, as shown in Figure 2a.
The light emitter is used to emit incident light of different wavelengths, and contains five different wavelengths of LED light source (490 nm, 660 nm, 730 nm, 850 nm, and 930 nm), and a power dissipation of 120 mW.These LEDs are connected by using shared anode, Figure 2b.
The light receptor is used to receive transmitted light and measure its intensity.Light receptor includes two parts, current to voltage convertor (I-V converter) and filter amplifier circuit (F-A circuit), Figure 2c.In I-V converter, the silicon photodiode (PD) whose bandwidth spans from 300 nm to 1100 nm is able to produce a current whose intensity is proportional to the received light intensity, and the current signal is transformed to a voltage signal (Vout 1 ) by MAX471 (Maxim, San Jose, San Jose, CA, USA), which is a current sensing IC presented by MAXIM company.In F-A circuit, the Vout 1 is filtered and amplified to V return that can be loaded into an analog to digital converter (ADC) via a low-pass first order filter and a proportional amplifier.
The controller is applied to control the LEDs and calculate measurement result.The controller is a SCM STM32f103 (STMicroelectronics, Geneva, Switzerland) minimum system with a small liquid-crystal display.There is a 12-bit ADC in this system, that can be used to measure the voltage signal produced by the light receptor and calculate absorbance of light according to Equation (2).There are two reasons for using a 12-bit ADC: 1.The 12-bit ADC provides 4096-level resolution, and we found that the acquired voltage signal varies from 0 to 3.6 V with an accuracy of 0.001 V, which needs a resolution of (3.6 − 0)/0.001= 3600.So a 4096 resolution of a 12-bit ADC is sufficient.2. The STM32f103 microcontroller we use has a 12-bit ADC, so there is no need to add an external ADC, which will reduce the cost.The demonstration of using the prototype of blood glucose sensor is showed in Figure 3. Using this prototype, the data of absorbance can be collected, and the algorithm to estimate the blood glucose concentration will be presented in Section 5.

Light Emitter Light Receptor
Earlobe Figure 3.The demonstration of using the prototype of blood glucose sensor, which is used to clamp the earlobe from the upper and lower sides to allow the incident light to penetrate the tissue and the transmitted light to be received by the light receptor.

Algorithm of NGM
This section will introduce the algorithm to build an Independent Feature Converter (IFC) and a Glucose Estimator (GE) that can be used to calculate the Blood Glucose Concentration (BGC) from the absorbance data collected by the sensor presented in Section 4. The process for parameter adjustment of IFC and GE is called training process, which needs a set of training sample data set that include six attributes, five absorbance attributes and one reference BGC attribute.The schematic diagram of this algorithm is showed in Figure 4.

Independent Feature Converter
IFC is used to transform absorbance vector A to independent components concentration vector C * .According to Equation ( 5), any component concentration can be nearly calculated by multiplication of the inverse of coefficient matrix Θ −1 with absorbance vector A. The key is determining the coefficient matrix Θ that is hard to measure and, to some extent, varies from person to person.Although Θ cannot be determined accurately, the approximate matrix can be figured out by Independent Component Analysis (ICA) [17] based on the training data set.
ICA is an effective liner-transformation algorithm in multivariate and linear algebra where a matrix is factorized into two matrices, with the property that can extract independent components from a complex data matrix [17].These independent components make the resulting matrices easier to inspect.In this paper, the two resulting matrices represent independent feature matrix F and coefficient matrix Θ respectively.The algorithm of ICA in this paper is called fast fixed-point algorithm [27], which uses maximum entropy to determine factorized results.
In the ICA, there is a basic premise that all the component concentrations are independent.This premise is not very strong, thus making some errors in BGC estimating, but the premise is very useful and the error can be compensated by the following GE.
First, we get an absorbance data matrix T from the training dataset, as where A T i means the transpose of the absorbance vector for the i-th record in training dataset.Based on Equation ( 5), the matrix T can be factorized to F and Θ where C T i means the transpose of concentration vector, and c j (i) is the j-th component concentration of the i-th record.F is called independent feature matrix which is immeasurable.Matrix Θ is the same coefficient matrix in Equation (5).
Since the physical significance of the decomposition of matrix T has been elucidated, let us start discussing how to decompose the matrix T. The principle of decomposition is to make matrix F contain as much information as possible.Considering A and C as random variables, matrix F contains the most information when random variable C obeys a Gaussian distribution according to the information theory.But [17] has pointed out that the signal analyzed by ICA cannot be subject to a Gaussian distribution and has proposed an alternative-the sigmoid function.Let C subject to a distribution whose probability density function (PDF) is the first order derivative of sigmoid function, as where f c (C) is the PDF of vector C and sigmoid (C) is first order derivative of the Gaussian distribution function.
Combining Equations ( 5) and ( 11), the PDF of vector is and then the coefficient matrix Θ is estimated by means of maximum likelihood estimation (MLE).The likelihood function is where θ j is the j-th row of Θ −1 .Using Adam method [28], the approximation of maximum point Θ can be obtained.Adam is an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.Adam is well suited for optimization problems that are large in terms of data parameters.The iterative formulas are where L(Θ) is the gradient of L(Θ) and t is the number of iterations.m and v are the first and second momentum, which are initialized at zero, and β 1 and β 2 are the momentum parameters ranging from 0 to 1. α is iteration step which is known as learning rate.m and v are bias corrections that are only relevant in first few iterations when t is small.The bias correction compensates for the fact that m and v are initialized at zero and need some time to "warm up".Calculating Θ by directly using all of T data requires much computing and memory resource.In practice, we randomly select a mini subset of T to calculate the gradient of L(Θ) at each iteration.This strategy is called mini-batch strategy, and the length of mini subset is called batch size.The pseudo-code of ICA based on Adam and mini-batch is in Algorithm 1.

Algorithm 1 ICA based on Adam and Mini-batch
Require: T, α, β 1 , β 2 , batch_size and max_iteration Ensure: T ← Randomly select a subset of size batch_size from T 6: if L(Θ ) > L then 13: L ← L(Θ ) 14: end if 16: end for 17: return Θ The IFC based on ICA, referred to by use of matrix Θ −1 , has been built, and the absorbance vector A can be transformed to independent components concentration vector C by Equation ( 15)

Glucose Estimator
The function of GE is to estimate BGC by reducing the errors and differences between the independent component concentration vector C * and the true components concentration vector C.These errors and differences are cased by the weak premise made in ICF, inaccurate measuring and any other unexpected factors.The input and output of GE are C * and BGC respectively.Because of some unknown factors, the transform process becomes a black-box process that can only be modeled by a machine learning model, such as Random Forest (RF), Adaboost, support vectors machine (SVM), decision tree (DT), etc. Plus, Deep Learning would also seem to be a good choice, but that requires thousands of training data, which is hard to get in our project at present.
By comparison (see Table 2), we use random forest (RF) to model the transform process in GE in this paper.Random forest is an ensemble learning method for classification and regression by constructing a multitude of decision trees [18].As other supervised model, RF needs a training data set to adjust parameters, and then RF can be used to estimate BGM from C * .The estimating value of RF is mean estimating value of the individual trees.The training data set contains a set of C * vectors that are transformed from data matrix T by IFC and a set of reference BGC data acquired by invasive glucose meter.There are two reasons why we choose RF to model GE.The first reason is that ICF is based on maximizing entropy of C * , which has the same optimization goal with RF.Secondly, comparing with other machine learning models in Table 2, its accuracy is proved.Through cross-validation, the number and depth of RF is set to 200 and 2, and the aggregating method is averaging the predictions of all tress in forests.Let Equation ( 16) represent GE.
In summary, the IFC and GE have been built and represented by Equations ( 15) and ( 16), which can estimate BGC from absorbance vector A acquired via the prototype designed in Section 4.

Experiment and Results
After producing the prototype, a series of clinical trials have been performed to verify the accuracy of the non-invasive glucose meter.This section is divided into two parts, which are data acquisition and experimental results.The former introduces how to gather experimental data from subjects, while the latter introduces how to prove the accuracy of the non-invasive glucose meter through the data gathered from subjects.

Data Acquisition
The purpose of data acquisition is to obtain a training dataset and a testing dataset, for ease of training the algorithm through these datasets and testing the accuracy of blood glucose estimation.For data gathering, each record includes nine attributes, of which the first one is the record's ID, and the second to the sixth gathering from our prototype are absorbance data of earlobe at each wavelength of 490 nm, 660 nm, 730 nm, 850 nm, and 930 nm.The seventh is blood glucose concentration for reference which is collected by a standard invasive glucose meter.The eighth attribute is the name of subject, and the ninth and tenth are date and time of the collection.
The data acquisition experiment is performed during September of the year 2017 from 1st to 15st.There are six yellow race adult subjects participating in the experiment, among them we have five diabetic patients and one healthy individual.The characteristics of each subjects are presented in Table 3.The prototype designed in this paper and a SANNUO TM invasive glucose meter are adopted as data acquire equipment.We obtain blood glucose data from each subject five times a day, which happened at morning on an empty stomach, 2 h after breakfast, right before lunch, 2 h after lunch, and right before dinner.During the data acquisition process, due to personal reasons of some subjects, such as physical examination, outing, etc., blood glucose measurement cannot be performed at the specified time, resulting in some data missing.In addition, we have excluded some abnormal data due to hardware reasons, such as signal loss, signal anomalies, etc.Finally, 286 valid records are received after eliminating unmeasurable or invalid data.Among these records, 84 records that are rather complete come from a subject on focus tracking who is labeled as subject #1, while the remaining 202 records come from the other five subjects who are labeled as subject #2 to subject #6.

Experiment Results
This paper has carried on four test experiments to verify the performance of NGM with the data acquired in Section 6.1.The first experiment called accuracy experiment is designed for accuracy testing.The second experiment called convergence experiment is used to determine the convergence speed of the NGM algorithm and the minimum number of samples for training.Through the third transplant experiment, we try to test whether this NGM can be directly transplanted to use for another person.The gauge of the performance of NGM are Clarke Error Grid Analysis (EGA) and the correlation coefficient for estimation and reference BGC formulated as where Corr is the correlation coefficient, ĉ and c are estimation and reference BGC vector respectively, and n is the length of BGC vector.EGA is designed to quantify clinical accuracy of blood glucose estimates generated by meters as compared to a reference value [19].The grid breaks down a scatterplot of a reference glucose meter and an evaluated glucose meter into five regions.The meanings of each region are listed in Table 4.Those points that would confuse treatment of hypoglycemia for hyperglycemia and vice versa.

Accuracy Experiment
We randomly select 80% samples from the dataset acquired from subject #1 as training dataset, and the other 20% is used as testing dataset.There are three steps in the accuracy experiment.Step1: Training the algorithm through the training dataset.Step2: Estimating BGC value through the obtained algorithm.Step3: Calculate the correlation coefficient for estimation and reference of BGC, and then draw the Clarke Error Grid.After this experiment, the correlation coefficient Corr of Random Forest with ICA equals to 0.819, and the Clarke Error Grid is shown as Figure 5b, which places 88.2% of points in regions A or B, 11.8% in region C, and no point in regions D or E. For comparison, the results of the Random Forest without ICA are shown in the Figure 5a.It can be concluded that ICA helps to improve the accuracy of Random Forest.

Convergence Experiment
We are particularly interested in finding how the size of training dataset affects the algorithm performance, because the size is related to how soon a good model can be built for a patient once he/she starts using our Earlight device.For this purpose, we construct a series of training datasets of different sizes, and use these training datasets to train the algorithm while recording the correlation coefficient in testing dataset.The relationship between the correlation coefficient and the number of samples in the training dataset is shown in Figure 6.From Figure 6, we find that the correlation coefficient reaches 0.5 rapidly where the number of samples is merely 10, and finally tend to a steady value of 0.80 where the number of samples is close to 60.This experiment result proves the convergence of the algorithm.In other words, the NGM system in this paper can partially complete the parameters adjustment after 10 times of training by a standard blood glucose meter, and reaches to a high accuracy estimation state after 60 times of training.Similarly, as a comparison, we also did a convergence experiment on RF wihout ICA, and the experimental results are shown with dotted line in the Figure 6.As can be seen, without the help of ICA, the convergence speed of RF is significantly reduced.

Comparative Experiment
To select regression method for glucose estimating is a critical but difficult work, which can only be accomplished contrasts of various methods through experiments.Before we chose the RF as glucose estimator (GE), a comparative test of RF [18], Adaboost [30], kNN [31], SVM [33], and Decision Tree (DT) [32] for glucose estimating has been conducted.This experiment is designed for two causes which are to compare performances of different regression methods and to analyze the performance of ICA.
The comparative experiment consists of two parts.The first part is about the selection of the optimal parameters for each regression method.Since the performance regression method depends on the parameters directly, to make sure the experimental results can represent its peak performance, it is of great importance to select a best group of parameters for regression method before conducting the comparative test.In this paper, we use 5-fold cross-validation to select best parameters for each regression method [34].First, select a group of parameters for cross-validation.Secondly, use cross-validation to test the accuracy of this group of parameters.In 5-fold cross-validation, the original training data set is randomly partitioned into 5 equal sized subsamples.Of the 5 subsamples, a single subsample is retained as the validation data for testing the model, and the remaining 4 subsamples are used as training data.The cross-validation process is then repeated 5 times, with each of the 5 subsamples used exactly once as the validation data.The 5 results from the folds can then be averaged to produce a single accuracy (correlation coefficient, Corr) that can be used as the accuracy of this group of parameters.Thirdly, try some other groups of parameters and test their accuracy.Finally, select the group of parameters that has the highest accuracy as the best group of parameters to conduct the next part of the experiment.Column 3 in Table 2 has listed the best parameters settings of different regression methods.All these methods are realized by the scikit-learn [29], which is a Python machine learning program package.
The second part of the comparative experiments, like accuracy and convergence experiments, is designed to compare the accuracy of different regression methods with or without ICA.First, randomly select 80% of samples from the dataset acquired from subject #1 as training dataset, and the other 20% of samples is used as testing dataset.Secondly, training various regression methods with the best parameters acquired from the first part of this experiment.Thirdly, calculate the accuracy (Corr) as the same way in accuracy experiment.Finally, draw the convergence curves of different models.In Table 2, column 4 has listed the best Corr of different regression methods, and their required number of samples in the training set that can make Corr reach to 0.5 and the best are listed in column 5 and 6 respectively.It can be noted that using RF with ICA model, it takes 60 times to train the algorithm to achieve an accuracy of Corr = 0.819, which means that each patient should prick himself/herself 60 times to inform the algorithm with his/her 60 blood concentration levels before performing non-invasive measurements.Still, this is twice as fast as TensorTip TM 's 100 times of training.
As is presented in Table 2, it is safe to say that RF shows a better performance than other regression methods.Although RF does not seem to have a fast convergence speed as Adaboost do, it does show a better convergence stability and a higher accuracy than Adaboost can provide.Since kNN and DT both put up a much lower accuracy comparing with RF, they can be ruled out of consideration.For glucose estimating, SVM represents a quite terrible performance, of which the possible reason is that we have not found out a proper kernel function.
As can be seen from Table 2, ICA has some positive effect on the accuracy and convergence speed of three regression methods which are RF, Adaboost, and DT.The reason of this phenomenon is that ICA is to be equivalent to introduce prior knowledge of blood glucose measurement into regression methods.Thanks to ICA, convergence speed of these algorithms have been improved, which make sure a higher accuracy with less amount of training samples.This kind of improvement is of great importance to NGM.In addition, all the three regression methods of RF, Adaboost, and DT are based on maximum entropy which ICA also sees as the objective of decomposition, therefore ICA is helpful to enhance the accuracy of these three regression methods.As for kNN, ICA does not do any good for improving its convergence speed or accuracy as imagined, on the contrary, it brings a negative effect.It is because that kNN is a regression method based on Euclidean distance, while ICA decomposition simply transfer from one space to another, which not only does no help but also brings in more uncertainties when it comes to distance calculation.
In conclusion, RF [18] combined with ICA shows the best performance, therefore this kind of combination has been chosen for the NGM in this paper.

Transplant Experiment
In the transplant experiment, we try to use the algorithm trained through the dataset acquired from subject #1 to estimate the BGC of other five subjects.The correlation coefficient turns out to be 0.153, 0.021, −0.172, 0.017, and −0.221.These results are not as ideal as expected which proves that the individually-tailored algorithm cannot be transplant to others for use.Meanwhile, this experiment result indicates the great influence of individual discrepancies on optical glucose measurement.To overcome the effect of individual differences, it is necessary to have a personal customized NGM, but that is almost never the case.Although the NGM proposed in this paper cannot be transplant to others for use directly, it can be transplant after a brief learning process to readjust parameters.

Comparision with other NGMs
NGM is a hotspot in the field of glucose detecting nowadays.So far, a large amount of NGM methods have been proposed, among them some have achieved a pretty high accuracy, however most of these methods are lab-only with unsolved issues of transition and popularization.The limitations of these methods themselves lead to this awkward situation, since all the methods require support from large amount of sophisticated instruments at high expenses with complex operation procedures and strict experimental conditions.A comparison on input, accuracy, hardware cost and condition requirement of the methods proposed in this paper and three other recently published methods is listed in Table 5. Different measurement methods have different levels of accuracy.Among them, some have a lower accuracy but a higher practicability.The quality of a method is not only judged by accuracy, instead more indicators should be taken into consideration when it comes to a comprehensive evaluation.As demonstrated by these experiments, the method proposed in this paper not only has a preferable accuracy, but also has the advantages of low cost and good robustness.In a word, our method shows an excellent comprehensive performance.

Figure 2 .
Figure 2. Structure and circuit design of the Earlight device: (a) Structural design; (b) Light emitter circuit; (c) Circuit of I-V Converter; (d) Filtering and amplifying circuit.VCC: Volt Current Condenser, GND: Ground, SHND: Shut down end, SIGN: Signal end, RS-: Resistance end.

Figure 4 .
Figure 4.The Schematic Diagram of the Non-invasive Glucose Measurement (NGM) algorithm consisting of the Independent Feature Converter (IFC) and Glucose Estimator (GE).

Figure 5 .Figure 6 .
Figure 5.Clarke Error Grid Analysis for our proposed NGM system ((a) is RF without ICA, and (b) is RF with ICA), in which the x-axis is the reference Blood Glucose Concentration (BGC) and the y-axis is the estimated BGC.In (b), 64.7% of points fall in region A, while in (a) there is only 47.1%.

Table 1 .
Comparison and analysis of mini-invasive and non-invasive blood glucose meters.CE:Communate Europpene, FDA:U.S. Food and Drug Administration.

Table 2 .
Comparative experiment results of various regression methods.
In f indicates that the method does not converge.

Table 3 .
The characteristics of each subject.

Table 4 .
The meaning of each region in Clarke Error Grid.

Table 5 .
Comparison of different Non-invasive Glucose Measurement (NGM) methods.