# Application of Improved LightGBM Model in Blood Glucose Prediction

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

#### 2.1. LightGBM Model

_{i}is the objective value, i is the predicted value, T represents the number of leaf nodes, q denotes the structure function of the tree, and w is the leaf weight.

_{j}is the sample set in leaf node j, namely:

_{j}of the jth leaf node is obtained, and the minimum value is obtained as follows:

_{t}(q) is obtained, as follows:

#### 2.2. Bayesian Hyper-Parameter Optimization Algorithm

#### 2.3. Improved LightGBM Model Based on Bayesian Hyper-Parameter Optimization Algorithm

- (1)
- Divide the dataset into training set and test set, process the missing values, analyze the weight of the influence of the eigenvalues on the results, delete useless eigenvalues, and delete outliers;
- (2)
- Use the Bayesian hyper-parameter optimization algorithm for the parameter optimization of the LightGBM model, and the HY_LightGBM model is constructed and trained;
- (3)
- Use the HY_LightGBM model for prediction and output the prediction results.
- (4)
- The specific experimental process is shown in Figure 5.

#### 2.4. Data Preprocessing

#### 2.5. Parameter Optimization Based on Bayesian Hyper-Parameter Optimization Algorithm

#### 2.6. Blood Glucose Prediction by HY_LightGBM Model

#### 2.7. Evaluation Indicators

## 3. Experimental Results and Discussion

#### 3.1. Experimental Results

#### 3.2. Comparative Experiments

#### 3.2.1. Comparison between the HY_LightGBM Model and LightGBM Model

#### 3.2.2. Comparison between HY_LightGBM Model and Other Classification Models

#### 3.2.3. Comparison of Parameter Tuning among Bayesian Hyper-Parameter Optimization Algorithm, Genetic Algorithm, and Random Searching Algorithm

## 4. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- American Diabetes Association. Standards of Medical Care in Diabetes—2019. Diabetes Care
**2019**, 42, S1–S2. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kerner, W.; Brückel, J. Definition, Classification and Diagnosis of Diabetes Mellitus. Exp. Clin. Endocrinol. Diabetes
**2014**, 122, 384–386. [Google Scholar] [CrossRef] [PubMed][Green Version] - Cho, N.; Shaw, J.; Karuranga, S.; Huang, Y.; Fernandes, J.D.R.; Ohlrogge, A.; Malanda, B. IDF Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res. Clin. Pract.
**2018**, 138, 271–281. [Google Scholar] [CrossRef] [PubMed] - WHO.int. Diabetes. Available online: https://www.who.int/news-room/fact-sheets/detail/diabetes (accessed on 26 March 2020).
- Zhang, P.; Zhang, X.; Brown, J.; Vistisen, D.; Sicree, R.; Shaw, J.; Nichols, G. Global healthcare expenditure on diabetes for 2010 and 2030. Diabetes Res. Clin. Pract.
**2010**, 87, 293–301. [Google Scholar] [CrossRef] - Tripathi, B.; Srivastava, A. Diabetes mellitus complications and therapeutics. Med. Sci. Monit.
**2006**, 12, RA130–RA147. [Google Scholar] - He, J. Blood Glucose Concentration Prediction Based on Canonical Correlation Analysis. In Proceedings of the 38th China Control Conference, Guangzhou, China, 27–30 July 2019; pp. 1354–1359. [Google Scholar]
- Yu, D.; Liu, Z.; Su, C.; Han, Y.; Duan, X.; Zhang, R.; Liu, X.; Yang, Y.; Xu, S. Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier. Thorac. Cancer
**2020**, 11, 95–102. [Google Scholar] [CrossRef] - Cenggoro, T.; Mahesworo, B.; Budiarto, A.; Baurley, J.; Suparyanto, T.; Pardamean, B. Features Importance in Classification Models for Colorectal Cancer Cases Phenotype in Indonesia. Procedia Comput. Sci.
**2019**, 157, 313–320. [Google Scholar] [CrossRef] - Chang, W.; Liu, Y.; Xiao, Y.; Yuan, X.; Xu, X.; Zhang, S.; Zhou, S. A Machine-Learning-Based Prediction Method for Hypertension Outcomes Based on Medical Data. Diagnostics
**2019**, 9, 178. [Google Scholar] [CrossRef][Green Version] - Azeez, O.; Wang, Q. XGBoost Model for Chronic Kidney Disease Diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2019**. [Google Scholar] [CrossRef] - Chang, W.; Liu, Y.; Xiao, Y.; Xu, X.; Zhou, S.; Lu, X.; Cheng, Y. Probability Analysis of Hypertension-Related Symptoms Based on XGBoost and Clustering Algorithm. Appl. Sci.
**2019**, 9, 1215. [Google Scholar] [CrossRef][Green Version] - Wang, B.; Feng, H.; Wang, F. Application of cat boost model based on machine learning in prediction of severe HFMD. Chin. J. Infect. Control
**2019**, 18, 18–22. [Google Scholar] - Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Rani, S.; Suri, B.; Goyal, R. On the Effectiveness of Using Elitist Genetic Algorithm in Mutation Testing. Symmetry
**2019**, 11, 1145. [Google Scholar] [CrossRef][Green Version] - Fernández, J.; López-Campos, J.; Segade, A.; Vilán, J. A genetic algorithm for the characterization of hyperelastic materials. Appl. Math. Comput.
**2018**, 329, 239–250. [Google Scholar] [CrossRef] - Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res.
**2012**, 13, 281–305. [Google Scholar] - Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; pp. 3146–3154. [Google Scholar]
- Gao, X.; Luo, H.; Wang, Q.; Zhao, F.; Ye, L.; Zhang, Y. A Human Activity Recognition Algorithm Based on Stacking Denoising Autoencoder and LightGBM. Sensors
**2019**, 19, 947. [Google Scholar] [CrossRef][Green Version] - Choi, S.; Hur, J. An Ensemble Learner-Based Bagging Model Using Past Output Data for Photovoltaic Forecasting. Energies
**2020**, 13, 1438. [Google Scholar] [CrossRef][Green Version] - Cordeiro, J.R.; Postolache, O.; Ferreira, J.C. Child’s Target Height Prediction Evolution. Appl. Sci.
**2019**, 9, 5447. [Google Scholar] [CrossRef][Green Version] - Ma, X.; Sa, J.; Wang, D.; Yu, Y.; Yang, Q.; Niu, X. Study on a prediction of p2p network loan default based on the machine learning lightgbm and xgboost algorithms according to different high dimensional data cleaning. Electron. Commer. Res. Appl.
**2018**, 31, 24–39. [Google Scholar] [CrossRef] - Cheng, C.; Qing, M.; Zhang, Q.; Ma, B. LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion. Chemom. Intell. Lab. Syst.
**2019**, 191, 54–64. [Google Scholar] [CrossRef] - Dev, V.A.; Eden, M.R. Formation lithology classification using scalable gradient boosted decision trees. Comput. Chem. Eng.
**2019**, 128, 392–404. [Google Scholar] [CrossRef] - Letham, B.; Karrer, B.; Ottoni, G.; Bakshy, E. Constrained bayesian optimization with noisy experiments. Bayesian Anal.
**2019**, 14, 495–519. [Google Scholar] [CrossRef] - Zhou, T.; Lu, H.; Wang, W.; Yong, X. GA-SVM based feature selection and parameter optimization in hospitalization expense modeling. Appl. Soft Comput. J.
**2018**, 75, 323–332. [Google Scholar] - Ma, C.; Yang, S.; Zhang, H.; Xiang, M.; Huang, Q.; Wei, Y. Prediction models of human plasma protein binding rate and oral bioavailability derived by using GA–CG–SVM method. J. Pharm. Biomed. Anal.
**2008**, 47, 677–682. [Google Scholar] [CrossRef] [PubMed] - Raman, M.; Somu, N.; Kirthivasan, K.; Liscano, R.; Sriram, V. An efficient intrusion detection system based on hypergraph—Genetic algorithm for parameter optimization and feature selection in support vector machine. Knowl. Based Syst.
**2017**, 134, 1–12. [Google Scholar] [CrossRef] - Su, B.; Wang, Y. Genetic algorithm based feature selection and parameter optimization for support vector regression applied to semantic textual similarity. J. Shanghai Jiaotong Univ. (Sci.)
**2015**, 20, 143–148. [Google Scholar] [CrossRef] - Putatunda, S.; Rama, K. A Comparative Analysis of Hyperopt as Against Other Approaches for Hyper-Parameter Optimization of XGBoost. In Proceedings of the 2018 International Conference on Signal Processing and Machine Learning (SPML ’18); Association for Computing Machinery: New York, NY, USA, 2018; pp. 6–10. [Google Scholar] [CrossRef]

**Figure 13.**Comparison chart of the predicted performance of HY_LightGBM, GA_LightGBM, and RS_LightGBM.

Eigenvalue Name | Eigenvalue Name Explanation (Unit) |
---|---|

ID | Physical examination personnel ID |

Gender | Male/female |

Age | Age |

Date of physical examination | Date of physical examination |

Aspartate aminotransferase | Aspartate aminotransferase (U/L) |

Alanine aminotransferase | Alanine aminotransferase (U/L) |

Alkaline phosphatase | Alkaline phosphatase (U/L) |

R-Glutamyltransferase | R-Glutamyltransferase (U/L) |

Total protein | Total serum protein (g/L) |

Albumin | Serum albumin (g/L) |

Globulin | Globulin (g/L) |

White ball ratio | Ratio of albumin to globulin |

Triglyceride | Serum triglyceride (mmol/L) |

Total cholesterol | Total cholesterol in lipoproteins (mmol/L) |

High density lipoprotein cholesterol | High density lipoprotein cholesterol (mg/dl) |

LDL cholesterol | LDL cholesterol (mg/dl) |

Urea | Urea (mmol/L) |

Creatinine | Products of muscle metabolism in human body (μ mol/L) |

Uric acid | Uric acid (umol/L) |

Hepatitis B surface antigen | Hepatitis B surface antigen (ng/mL) |

Hepatitis B surface antibody | Hepatitis B surface antibody (mIU/mL) |

Hepatitis B e antigen | Hepatitis B e antigen (PEI/mL) |

Hepatitis B e antibody | Hepatitis B e antibody (P/mL) |

Hepatitis B core antibody | Hepatitis B core antibody (PEI/mL) |

Leukocyte count | Leukocyte count (×10^{9}/L) |

RBC count | RBC count (×10^{12}/L) |

Hemoglobin | Hemoglobin (g/L) |

Hematocrit | Hematocrit |

Mean corpuscular volume | Mean corpuscular volume (fl) |

Mean corpuscular hemoglobin | Mean corpuscular hemoglobin (pg) |

Mean corpuscular hemoglobin concentration | Mean corpuscular hemoglobin concentration (g/L) |

Red blood cell volume distribution width | Red blood cell volume distribution width |

Platelet count | Platelet count (×10^{9}/L) |

Mean platelet volume | Mean platelet volume (fl) |

Platelet volume distribution width | Platelet volume distribution width (%) |

Platelet specific volume | Platelet specific volume (%) |

Neutrophils | Neutrophils (%) |

Lymphocyte | Lymphocyte (%) |

Monocyte | Monocyte (%) |

Eosinophils | Eosinophils (%) |

Basophilic cell | Basophilic cell (%) |

Blood glucose | Blood glucose level (mg/dl) |

Parameter Name | Default Value | Optimal Parameters | Parameter Implication |
---|---|---|---|

learning_rate | 0.1 | 0.052 | Learning rate |

n_estimators | 10 | 376 | Number of basic learners |

min_data_in_leaf | 20 | 18 | The smallest possible record tree for a leaf |

bagging_fraction | 1 | 0.9 | Data scale for each iteration |

feature_fraction | 1 | 0.5 | Proportion of randomly selected features in each iteration |

Model Name | MSE | RMSE | R-Square | Training Time |
---|---|---|---|---|

HY_LightGBM | 0.5961 | 0.7721 | 0.2236 | 26.7938 s |

Model Name | MSE | RMSE | R-Square | Training Time |
---|---|---|---|---|

LightGBM | 0.6159 | 0.7848 | 0.1978 | 3.5039 s |

HY_LightGBM | 0.5961 | 0.7721 | 0.2236 | 26.7938 s |

Model Name | MSE | RMSE | R-Square | Training Time |
---|---|---|---|---|

HY_LightGBM | 0.5961 | 0.7721 | 0.2236 | 26.7938 s |

XBGoost | 0.6284 | 0.7927 | 0.1815 | 6.2837 s |

CatBoost | 0.6483 | 0.8051 | 0.1556 | 73.3301 s |

Parameter Name | GA_LightGBM | RS_LightGBM |
---|---|---|

learning_rate | 0.05 | 0.05 |

n_estimators | 400 | 370 |

min_data_in_leaf | 60 | 36 |

bagging_fraction | 0.9 | 0.9 |

feature_fraction | 0.5 | 0.98 |

Model Name | MSE | RMSE | R-Square |
---|---|---|---|

GA_LightGBM | 0.6116 | 0.7821 | 0.2033 |

RS_LightGBM | 0.6094 | 0.7806 | 0.2063 |

HY_LightGBM | 0.5961 | 0.7721 | 0.2236 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wang, Y.; Wang, T.
Application of Improved LightGBM Model in Blood Glucose Prediction. *Appl. Sci.* **2020**, *10*, 3227.
https://doi.org/10.3390/app10093227

**AMA Style**

Wang Y, Wang T.
Application of Improved LightGBM Model in Blood Glucose Prediction. *Applied Sciences*. 2020; 10(9):3227.
https://doi.org/10.3390/app10093227

**Chicago/Turabian Style**

Wang, Yan, and Tao Wang.
2020. "Application of Improved LightGBM Model in Blood Glucose Prediction" *Applied Sciences* 10, no. 9: 3227.
https://doi.org/10.3390/app10093227