# Automated Machine Learning to Develop Predictive Models of Metabolic Syndrome in Patients with Periodontal Disease

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

- (1)
- Several predictive models were built using two state-of-the-art AutoML frameworks: H2O AutoML and Auto-sklearn.
- (2)
- The best two models were selected, evaluated, and validated, comparing the prediction results through performance metrics.
- (3)
- A SHAP wrapper specifically designed for Auto-sklearn models was implemented to obtain their corresponding SHAP values.
- (4)
- SHAP was used to analyze machine learning models, to explain the predictions, and to highlight the most important predictive variables.

_{0}) stating that a higher periodontitis stage [44] does not increase the incidence of metabolic syndrome. The significance of establishing these connections lies in the potential of implementation of a multidisciplinary treatment paradigm and suggests some procedures or techniques that could prevent/interrupt critical pathological links to provide positive results and enhance overall health.

## 2. Related Work

## 3. Materials and Methods

#### 3.1. Efficient Model Development with AutoML Frameworks

**Model Selection**: Automatically selecting an appropriate machine learning model or algorithm based on the problem type (e.g., classification, regression) and data set characteristics.**Feature Selection**: Choosing a subset of relevant features or variables from a larger set of available features, to improve the overall efficiency of the machine learning pipeline (improve model interpretability and reduce computational complexity).**Hyperparameter Tuning**: Optimizing the hyperparameters of the selected model(s) to improve their performance.**Model Training**: Training the selected model(s) on the training data set using the optimized hyperparameters.**Validation**: Evaluating model performance on the validation data set to ensure that it meets predefined criteria, such as accuracy or F1 score. If the model does not meet the criteria, it may return to hyperparameter tuning or model selection steps.

#### 3.2. Model-Agnostic Explainability with SHAP Method

- (Efficiency) The contributions should add up to the difference between the profit generated by all players and the profit obtained without any player:$$\sum _{i=1}^{d}{\varphi}_{i}(v)=v(D)-v(\varnothing )}.$$
- (Symmetry) If two players are interchangeable (the impact in the generated profit is the same), it follows that their individual contributions are identical, or as follows: $v(S\cup \{i\})=v(S\cup \{j\})forallS\Rightarrow {\varphi}_{i}(v)={\varphi}_{j}(v)$.
- (Dummy) If a player has no impact in generating profit, then his contribution is zero, or as follows: $v(S\cup \{i\})=v(S)forallS\Rightarrow {\varphi}_{i}(v)=0$.
- (Monotonicity) If the marginal contribution to the profit generated in game $v$ by player $i$ by joining any coalition $S$ is greater than that obtained in game ${v}^{\prime}$, then the contribution of the player $i$ in $v$ is greater than in ${v}^{\prime}$, or as follows: $v(S\cup \{i\})-v(S)\ge {v}^{\prime}(S\cup \{i\})-{v}^{\prime}(S)forallS\Rightarrow {\varphi}_{i}(v)\ge {\varphi}_{i}({v}^{\prime})$.
- (Linearity) If the game $v$ is viewed as a linear combination of games ${v}_{1},{v}_{2},\dots ,{v}_{k}$, or $v={c}_{1}{v}_{1}+{c}_{2}{v}_{2}+\dots +{c}_{k}{v}_{k}$, then the contribution of each player $i$ in the game $v$ is expressed as follows: ${\varphi}_{i}(v)={c}_{1}{\varphi}_{i}({v}_{1})+{c}_{2}{\varphi}_{i}({v}_{2})+\dots +{c}_{k}{\varphi}_{i}({v}_{k})$, $i\in D$.

_{1}, X

_{2}, …, X

_{d}). In the following, uppercase symbols (e.g., X) are employed to represent random variables, while lowercase symbols (e.g., x) are used to denote specific values.

- Generate K sample coalitions: ${{z}^{\prime}}_{k}\in {\{0,1\}}^{m},1\le k\le K$. These compose the data set for the regression model.
- Get prediction for each ${{z}^{\prime}}_{k}$, by mapping it into the original feature space and then applying the model f: $f({h}_{x}({{z}^{\prime}}_{k}))$
- Compute the weight for each ${{z}^{\prime}}_{k}$ with the SHAP kernel, defined by the following formula:${\pi}_{x}({{z}^{\prime}}_{k})=\frac{m-1}{\left(\begin{array}{l}m\\ |{{z}^{\prime}}_{k}|\end{array}\right)|{{z}^{\prime}}_{k}|(m-|{{z}^{\prime}}_{k}|)}$, where $\left|{{z}^{\prime}}_{k}\right|$ is the number of present features in the coalition.
- Train the linear regression model (1) by minimizing the following loss function:$$L(f,e,{\pi}_{x})={\displaystyle \sum _{{z}^{\prime}\in Z}[f({h}_{x}({z}^{\prime}}))-e({z}^{\prime}){]}^{2}{\pi}_{x}({z}^{\prime})$$
- Return approximate Shapley values ${\varphi}_{j}$ (coefficients of the linear regression model).

_{j}, its importance is computed with formula

#### 3.3. Study Design

#### 3.4. Data Set

- Dependent variable (target): Metabolic syndrome.
- Independent variables (feature variables): DMFT (Decayed, Missing due to caries, and Filled Teeth), CPI (Community Periodontal Index), Periodontal pockets depth, Gingival bleeding, Daily tooth brushing, Dental control, Gingival attachment loss, CV (Cardiovascular) risk, Carotid atherosclerosis, and EQ-5D-5L score.

#### 3.5. Performance Evaluation

## 4. Results

#### 4.1. Prediction Models

#### 4.2. Explainability of Prediction Models Using SHAP Framework

- Global interpretability—the SHAP values provide a comprehensive view of how each predictor contributes to the target variable, offering insights into both positive and negative influences. This allows understanding the overall impact of each feature on the model’s predictions.Local interpretability—each observation is assigned its own set of SHAP values. Thus, one can explain why a case receives its prediction and the contributions of the predictors.
- SHAP does not have direct support for H2O models.

#### 4.2.1. Global Interpretability

- a strong positive correlation with Periodontal pockets (depth) and CV risk;
- a moderate positive correlation with CPI, (gingival) Bleeding, and Gingival attachment loss;
- a moderate negative correlation with EQ-5D-5L score, (daily) Tooth brushing, and Dental control.

**Feature importance**: Variables are ranked in descending order, based on their significance or importance, just like in the variable importance plot.**Impact**: In the beeswarm chart, the x-axis represents the SHAP values, computed for each feature of each record in the data set. If a SHAP value is on the right side of the plot, it corresponds to a positive impact on the prediction, leading the model to predict 1 (metabolic syndrome). Conversely, if a SHAP value is on the left side of the plot, it corresponds to a lower prediction or outcome which causes the model to predict 0 (absence of metabolic syndrome).**Value**: colors are used to indicate whether a feature variable’s value is relatively high (shade close to red) or low (shade close to blue) for a specific observation.**Correlation**: The summary plot shows the positive and negative relationships of the predictors with the target variable. The position of the point along the horizontal axis shows how the feature’s value for that observation affects the prediction (higher or lower). Thus, a high depth of periodontal pockets has a positive association with metabolic syndrome. The “high” comes from the red color (which corresponds to high values of the variable), and the “positive” impact is shown on the x-axis (the SHAP value is on the right side of the plot). Similarly, we will say the “EQ-5D-5L” is negatively correlated with metabolic syndrome (target variable). From the charts represented in Figure 7, it can be concluded that high values (red dots) of the variables CV risk, CPI, DMFT, Carotid atherosclerosis, Gingival attachment loss, and (gingival) Bleeding are associated with positive SHAP values, so they correspond to an increased probability of occurrence of metabolic syndrome. High values of the variables (daily) Tooth brushing, Dental control, and EQ-5D-5L (represented in the charts by red dots) correspond to small (negative) SHAP values when compared to the low values of these feature variables. This suggests that as these variables increase (e.g., higher levels of tooth brushing, better dental control, or higher EQ-5D-5L scores), the probability of the occurrence of metabolic syndrome decreases.

#### 4.2.2. Local Interpretability

## 5. Discussion

_{0}) stating that a higher periodontitis stage does not increase the incidence of metabolic syndrome. Our findings show that stages III and IV of periodontitis (periodontal pockets depth $\ge $ 6 mm) induce an increased incidence of metabolic syndrome, while for stage I (periodontal pockets depth $\le $ 4 mm) a low incidence of metabolic syndrome is found.

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Nazir, M.; Al-Ansari, A.; Al-Khalifa, K.; Alhareky, M.; Gaffar, B.; Almas, K. Global Prevalence of Periodontal Disease and Lack of its Surveillance. Sci. World J.
**2020**, 2020, 2146160. [Google Scholar] [CrossRef] - Meurman, J.H.; Sanz, M.; Janket, S.J. Oral Health, Atherosclerosis, and Cardiovascular Disease. Crit. Rev. Oral Biol. Med.
**2004**, 15, 403–413. [Google Scholar] [CrossRef] [PubMed] - Lowe, G.D.O. Dental disease, coronary heart disease and stroke, and inflammatory markers: What are the associations, and what do they mean? Circulation
**2004**, 109, 1076–1078. [Google Scholar] [CrossRef] [PubMed] - Delisle, H. Early nutritional influences on obesity, diabetes and cardiovascular disease risk. International Workshop, Université de Montréal, June 6–9, 2004. Matern. Child Nutr.
**2005**, 1, 128–129. [Google Scholar] [CrossRef] - Sakakibara, B.M.; Obembe, A.O.; Eng, J.J. The prevalence of cardiometabolic multimorbidity and its association with physical activity, diet, and stress in Canada: Evidence from a population-based cross-sectional study. BMC Public Health
**2019**, 19, 1361. [Google Scholar] [CrossRef] [PubMed] - Gomes-Filho, I.S.; das Mercês, M.C.; de Santana Passos-Soares, J.; Seixas da Cruz, S.; Teixeira Ladeia, A.M.; Trindade, S.C.; de Moraes Marcílio Cerqueira, E.; Freitas Coelho, J.M.; Marques Monteiro, F.M.; Barreto, M.L.; et al. Severity of Periodontitis and Metabolic Syndrome: Is There an Association? J. Periodontol.
**2016**, 87, 357–366. [Google Scholar] [CrossRef] [PubMed] - Kotin, J.; Walther, C.; Wenzel, U.; Zyriax, B.C.; Borof, K.; Schnabel, R.B.; Seedorf, U.; Jagodzinski, A.; Heydecke, G.; Lamprecht, R.; et al. Association between periodontitis and metabolic syndrome in the Hamburg City Health Study. J. Periodontol.
**2022**, 93, 1150–1160. [Google Scholar] [CrossRef] - Rezaianzadeh, A.; Namayandeh, S.M.; Sadr, S.M. National Cholesterol Education Program Adult Treatment Panel III Versus International Diabetic Federation Definition of Metabolic Syndrome, Which One is Associated with Diabetes Mellitus and Coronary Artery Disease? Int. J. Prev. Med.
**2012**, 3, 552–558. [Google Scholar] - Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults. Executive Summary of The Third Report of The National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, And Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III). JAMA
**2001**, 285, 2486–2497. [Google Scholar] [CrossRef] - Pirih, F.Q.; Monajemzadeh, S.; Singh, N.; Sinacola, R.S.; Shin, J.M.; Chen, T.; Fenno, J.C.; Kamarajan, P.; Rickard, A.H.; Travan, S.; et al. Association between metabolic syndrome and periodontitis: The role of lipids, inflammatory cytokines, altered host response, and the microbiome. Periodontol 2000
**2021**, 87, 50–75. [Google Scholar] [CrossRef] - Schultze, L.B.; Maldonado, A.; Lussi, A.; Sculean, A.; Eick, S. The Impact of the pH Value on Biofilm Formation. Monogr. Oral Sci.
**2021**, 29, 19–29. [Google Scholar] [CrossRef] [PubMed] - Senini, V.; Amara, U.; Paul, M.; Kim, H. Porphyromonas gingivalis lipopolysaccharide activates platelet Cdc42 and promotes platelet spreading and thrombosis. J. Periodontol.
**2019**, 90, 1336–1345. [Google Scholar] [CrossRef] [PubMed] - Cowan, L.T.; Lakshminarayan, K.; Lutsey, P.L.; Folsom, A.R.; Beck, J.; Offenbacher, S.; Pankow, J.S. Periodontal disease and incident venous thromboembolism: The Atherosclerosis Risk in Communities study. J. Clin. Periodontol.
**2019**, 46, 12–19. [Google Scholar] [CrossRef] [PubMed] - Pardo Romero, F.F.; Hernández, L.J. Periodontal disease: Epidemiological approaches for its analysis as a public health concern. Rev. Salud Publica
**2018**, 20, 258–264. [Google Scholar] [PubMed] - Maas, C.; Renné, T. Coagulation factor XII in thrombosis and inflammation. Blood
**2018**, 131, 1903–1909. [Google Scholar] [CrossRef] [PubMed] - Kabashima, H.; Maeda, K.; Iwamoto, Y.; Hirofuji, T.; Yoneda, M.; Yamashita, K.; Aono, M. Partial characterization of an interleukin-1-like factor in human gingival crevicular fluid from patients with chronic inflammatory periodontal disease. Infect. Immun.
**1990**, 58, 2621–2627. [Google Scholar] [CrossRef] [PubMed] - Saito, T.; Murakami, M.; Shimazaki, Y.; Matsumoto, S.; Yamashita, Y. The extent of alveolar bone loss is associated with impaired glucose tolerance in Japanese men. J. Periodontol.
**2006**, 77, 392–397. [Google Scholar] [CrossRef] [PubMed] - Saito, T.; Murakami, M.; Shimazaki, Y.; Oobayashi, K.; Matsumoto, S.; Koga, T. Association Between Alveolar Bone Loss and Elevated Serum C-Reactive Protein in Japanese Men. J. Periodontol.
**2003**, 74, 1741–1746. [Google Scholar] [CrossRef] - Jain, P.; Ved, A.; Dubey, R.; Singh, N.; Parihar, A.S.; Maytreyee, R. Comparative Evaluation of Serum Tumor Necrosis Factor α in Health and Chronic Periodontitis: A Case-Control Study. Contemp. Clin. Dent.
**2020**, 11, 342–349. [Google Scholar] [CrossRef] - Chopra, R.; Patil, S.R.; Kalburgi, N.B.; Mathur, S. Association between alveolar bone loss and serum C-reactive protein levels in aggressive and chronic periodontitis patients. J. Indian Soc. Periodontol.
**2012**, 16, 28–31. [Google Scholar] [CrossRef] - Eckel, R.H.; Grundy, S.M.; Zimmet, P.Z. The metabolic syndrome. Lancet
**2005**, 365, 1415–1428. [Google Scholar] [CrossRef] [PubMed] - Gobin, R.; Tian, D.; Liu, Q.; Wang, J. Periodontal Diseases and the Risk of Metabolic Syndrome: An Updated Systematic Review and Meta-Analysis. Front. Endocrinol.
**2020**, 11, 336. [Google Scholar] [CrossRef] [PubMed] - Ngoude, J.X.E.; Moor, V.J.A.; Nadia-Flore, T.T.; Agoons, B.B.; Marcelle, G.G.C.; MacBrain, E.E.; Tcheutchoua, D.N.; Nkeck, J.R. Relationship between periodontal diseases and newly-diagnosed metabolic syndrome components in a sub-Saharan population: A cross sectional study. BMC Oral Health
**2021**, 21, 326. [Google Scholar] [CrossRef] [PubMed] - Demmer, R.T.; Squillaro, A.; Papapanou, P.N.; Rosenbaum, M.; Friedewald, W.T.; Jacobs, D.R., Jr.; Desvarieux, M. Periodontal infection, systemic inflammation, and insulin resistance: Results from the continuous National Health and Nutrition Examination Survey (NHANES) 1999–2004. Diabetes Care
**2012**, 35, 2235–2242. [Google Scholar] [CrossRef] [PubMed] - Blasco-Baque, V.; Garidou, L.; Pomié, C.; Escoula, Q.; Loubieres, P.; Le Gall-David, S.; Lemaitre, M.; Nicolas, S.; Klopp, P.; Waget, A.; et al. Periodontitis induced by Porphyromonas gingivalis drives periodontal microbiota dysbiosis and insulin resistance via an impaired adaptive immune response. Gut
**2017**, 66, 872–885. [Google Scholar] [CrossRef] [PubMed] - Jepsen, S.; Suvan, J.; Deschner, J. The association of periodontal diseases with metabolic syndrome and obesity. Periodontology
**2020**, 83, 125–153. [Google Scholar] [CrossRef] [PubMed] - Lamster, I.B.; Pagan, M. Periodontal disease and the metabolic syndrome. Int. Dent. J.
**2017**, 67, 67–77. [Google Scholar] [CrossRef] [PubMed] - Velioğlu, E.M.; Aydındoğan, S.; Hakkı, S.S. Metabolic Syndrome and Periodontal Disease. Curr. Oral Health
**2023**, 10, 43–51. [Google Scholar] [CrossRef] - World Health Organization. Oral Health Surveys. Basic Methods, 5th ed.; WHO Press: Geneva, Switzerland, 2013; pp. 35–56. [Google Scholar]
- Huck, N. Large data sets and machine learning: Applications to statistical arbitrage. Eur. J. Oper. Res.
**2019**, 278, 330–342. [Google Scholar] [CrossRef] - Ganaie, M.A.; Hu, M.; Malik, A.K.; Tanveer, M.; Suganthan, P.N. Ensemble deep learning: A review. Eng. Appl. Artif. Intell.
**2022**, 115, 105151. [Google Scholar] [CrossRef] - O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy; Crown Publishing Group: New York, NY, USA, 2016; 272p. [Google Scholar]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion
**2020**, 58, 82–115. [Google Scholar] [CrossRef] - Goodman, B.; Flaxman, S. European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation”. AI Mag.
**2017**, 38, 50–57. [Google Scholar] [CrossRef] - Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS′17), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4768–4777. [Google Scholar]
- Zhong, X.; Gallagher, B.; Liu, S.; Kailkhura, B.; Hiszpanski, A.; Han, T.Y.J. Explainable machine learning in materials science. Npj Comput. Mater.
**2022**, 8, 204. [Google Scholar] [CrossRef] - Vishwarupe, V.; Joshi, P.M.; Mathias, N.; Maheshwari, S.; Mhaisalkar, S.; Pawar, V. Explainable AI and Interpretable Machine Learning: A Case Study in Perspective. Procedia Comput. Sci.
**2022**, 204, 869–876. [Google Scholar] [CrossRef] - Herdman, M.; Gudex, C.; Lloyd, A.; Janssen, M.; Kind, P.; Parkin, D.; Bonsel, G.; Badia, X. Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L). Qual. Life Res.
**2011**, 20, 1727–1736. [Google Scholar] [CrossRef] [PubMed] - eq5d: Methods for Analysing EQ-5D Data and Calculating EQ-5D Index Scores. Available online: https://rdrr.io/cran/eq5d/ (accessed on 6 September 2023).
- LeDell, E.; Poirier, S. H2O AutoML: Scalable Automatic Machine Learning. In Proceedings of the 7th ICML Workshop on Automated Machine Learning, Vienna, Austria, 17–18 July 2020; Available online: https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf (accessed on 6 March 2022).
- Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J.T.; Blum, M.; Hutter, F. Efficient and robust automated machine learning. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 7–12 December 2015; Volume 2, pp. 2755–2763. [Google Scholar]
- Feurer, M.; Eggensperger, K.; Falkner, S.; Lindauer, M.; Hutter, F. Auto-Sklearn 2.0: Hands-Free AutoML via Meta-Learning. arXiv
**2022**, arXiv:2007.04074. Available online: https://arxiv.org/abs/2007.04074 (accessed on 3 March 2022). - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Tonetti, M.S.; Greenwell, H.; Kornman, K.S. Staging and grading of periodontitis: Framework and proposal of a new classification and case definition. J. Periodontol.
**2018**, 89 (Suppl. S1), S159–S172. [Google Scholar] [CrossRef] - Patel, J.S.; Su, C.; Tellez, M.; Albandar, J.M.; Rao, R.; Iyer, V.; Shi, E.; Wu, H. Developing and testing a prediction model for periodontal disease using machine learning and big electronic dental record data. Front. Artif. Intell.
**2022**, 5, 979525. [Google Scholar] [CrossRef] - Yu, C.; Lin, Y.; Lin, C.; Wang, S.; Lin, S.; Lin, S.; Wu, J.; Chang, S. Predicting Metabolic Syndrome with Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study. JMIR Med. Inform.
**2020**, 8, e17110. [Google Scholar] [CrossRef] - Sghaireen, M.G.; Al-Smadi, Y.; Al-Qerem, A.; Srivastava, K.C.; Ganji, K.K.; Alam, M.K.; Nashwan, S.; Khader, Y. Machine Learning Approach for Metabolic Syndrome Diagnosis Using Explainable Data-Augmentation-Based Classification. Diagnostics
**2022**, 12, 3117. [Google Scholar] [CrossRef] [PubMed] - Yang, H.; Yu, B.; OuYang, P.; Li, X.; Lai, X.; Zhang, G.; Zhang, H. Machine learning-aided risk prediction for metabolic syndrome based on 3 years study. Sci. Rep.
**2022**, 12, 2248. [Google Scholar] [CrossRef] [PubMed] - Gutiérrez-Esparza, G.O.; Ramírez-delReal, T.A.; Martínez-García, M.; Infante Vázquez, O.; Vallejo, M.; Hernández-Torruco, J. Machine and Deep Learning Applied to Predict Metabolic Syndrome without a Blood Screening. Appl. Sci.
**2021**, 11, 4334. [Google Scholar] [CrossRef] - Zhang, H.; Chen, D.; Shao, J.; Zou, P.; Cui, N.; Tang, L.; Wang, X.; Wang, D.; Wu, J.; Ye, Z. Machine Learning-Based Prediction for 4-Year Risk of Metabolic Syndrome in Adults: A Retrospective Cohort Study. Risk Manag. Healthc. Policy
**2021**, 14, 4361–4368. [Google Scholar] [CrossRef] [PubMed] - Park, J.E.; Mun, S.; Lee, S. Metabolic Syndrome Prediction Models Using Machine Learning and Sasang Constitution Type. Evid. Based Complement. Altern. Med.
**2021**, 2021, 8315047. [Google Scholar] [CrossRef] [PubMed] - Monsarrat, P.; Bernard, D.; Marty, M.; Cecchin-Albertoni, C.; Doumard, E.; Gez, L.; Aligon, J.; Vergnes, J.N.; Casteilla, L.; Kemoun, P. Systemic Periodontal Risk Score Using an Innovative Machine Learning Strategy: An Observational Study. J. Pers. Med.
**2022**, 12, 217. [Google Scholar] [CrossRef] [PubMed] - Bashir, N.Z.; Rahman, Z.; Chen, S.L. Systematic comparison of machine learning algorithms to develop and validate predictive models for periodontitis. J. Clin. Periodontol.
**2022**, 49, 958–969. [Google Scholar] [CrossRef] [PubMed] - Shin, H.; Shim, S.; Oh, S. Machine learning-based predictive model for prevention of metabolic syndrome. PLoS ONE
**2023**, 18, e0286635. [Google Scholar] [CrossRef] - Pietropaoli, D.; Altamura, S.; Ortu, E.; Guerrini, L.; Pizarro, T.T.; Ferri, C.; Del Pinto, R. Association between metabolic syndrome components and gingival bleeding is women-specific: A nested cross-sectional study. J. Transl. Med.
**2023**, 21, 252. [Google Scholar] [CrossRef] - Fukui, N.; Shimazaki, Y.; Shinagawa, T.; Yamashita, Y. Periodontal status and metabolic syndrome in middle-aged Japanese. J. Periodontol.
**2012**, 83, 1363–1371. [Google Scholar] [CrossRef] - Ytzhaik, N.; Zur, D.; Goldstein, C.; Almoznino, G. Obstructive Sleep Apnea, Metabolic Dysfunction, and Periodontitis—Machine Learning and Statistical Analyses of the Dental, Oral, Medical Epidemiological (DOME) Big Data Study. Metabolites
**2023**, 13, 595. [Google Scholar] [CrossRef] [PubMed] - Trigka, M.; Dritsas, E. Predicting the Occurrence of Metabolic Syndrome Using Machine Learning Models. Computation
**2023**, 11, 170. [Google Scholar] [CrossRef] - Nibali, L.; Donos, N.; Terranova, V.; Di Pino, A.; Di Marca, S.; Ferrara, V.; Pisano, M.; Scicali, R.; Rabuazzo, A.M.; Purrello, F.; et al. Left ventricular geometry and periodontitis in patients with the metabolic syndrome. Clin. Oral Investig.
**2019**, 23, 2695–2703. [Google Scholar] [CrossRef] [PubMed] - Gomes-Filho, I.S.; Santos, P.N.P.; Cruz, S.S.; Figueiredo, A.C.M.G.; Trindade, S.C.; Ladeia, A.M.; Cerqueira, E.M.M.; Passos-Soares, J.S.; Coelho, J.M.F.; Hintz, A.M.; et al. Periodontitis and its higher levels of severity are associated with the triglyceride/high density lipoprotein cholesterol ratio. J. Periodontol.
**2021**, 92, 1509–1521. [Google Scholar] [CrossRef] [PubMed] - Hutter, F.; Kotthoff, L.; Vanschoren, J. Automated Machine Learning: Methods, Systems, Challenges; Springer: Cham, Switzerland, 2019. [Google Scholar]
- H2O AutoML. Available online: https://github.com/h2oai/h2o-3/tree/master (accessed on 26 November 2023).
- Auto-Sklearn. Available online: https://automl.github.io/auto-sklearn/master (accessed on 26 November 2023).
- Van der Laan, M.J.; Polley, E.C.; Hubbard, A.E. Super learner. Stat. Appl. Genet. Mol. Biol.
**2007**, 6, 25. [Google Scholar] [CrossRef] [PubMed] - Hutter, F.; Hoos, H.; Leyton-Brown, K. Sequential model-based optimization for general algorithm configuration. In Proceedings of the 5th International Conference on Learning and Intelligent Optimization (LION’11), Rome, Italy, 17–21 January 2011; pp. 507–523. [Google Scholar]
- Covert, I.; Lee, S. Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Virtual, 13–15 April 2021; pp. 3457–3465. [Google Scholar]
- Covert, I.C.; Lundberg, S.; Lee, S. Understanding global feature contributions with additive importance measures. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS′20), Vancouver BC Canada, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020; pp. 17212–17223. [Google Scholar]
- Shapley, L.S. A value for n-person games. In Contributions to the Theory of Games; Princeton University Press: Princeton, NJ, USA, 1953; Volume 2, pp. 307–317. [Google Scholar] [CrossRef]
- Molnar, C. Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. 2023. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 26 November 2023).
- SHAP Framework. Available online: https://github.com/shap/shap (accessed on 26 November 2023).
- Leary, S.P. Shapley-Value (ML Interpretability) Using H2O AutoML. Available online: https://github.com/SeanPLeary/shapley-values-h2o-example (accessed on 10 October 2023).
- Zhang, S.; Wang, J.; Pei, L.; Liu, K.; Gao, Y.; Fang, H.; Zhang, R.; Zhao, L.; Sun, S.; Wu, J.; et al. Interpretability Analysis of One-Year Mortality Prediction for Stroke Patients Based on Deep Neural Network. IEEE J. Biomed. Health Inform.
**2022**, 26, 1903–1910. [Google Scholar] [CrossRef] [PubMed] - Tonetti, M.S.; Jepsen, S.; Jin, L.; Otomo-Corgel, J. Impact of the global burden of periodontal diseases on health, nutrition and wellbeing of mankind: A call for global action. J. Clin. Periodontol.
**2017**, 44, 456–462. [Google Scholar] [CrossRef] [PubMed] - Spectre, G.; Östenson, C.G.; Li, N.; Hjemdahl, P. Postprandial Platelet Activation Is Related to Postprandial Plasma Insulin Rather Than Glucose in Patients with Type 2 Diabetes. Diabetes
**2012**, 61, 2380–2384. [Google Scholar] [CrossRef] - Kivimäki, M.; Steptoe, A. Effects of stress on the development and progression of cardiovascular disease. Nat. Rev. Cardiol.
**2018**, 15, 215–229. [Google Scholar] [CrossRef] - Kivimäki, M.; Bartolomuci, A.; Kawarkhi, I. The multiple roles of life stress in metabolic disorders. Nat. Rev. Endocrinol.
**2023**, 19, 10–27. [Google Scholar] [CrossRef] - Lăzureanu, P.C.; Popescu, F.; Tudor, A.; Stef, L.; Negru, A.G.; Mihăilă, R. Saliva pH and Flow Rate in Patients with Periodontal Disease and Associated Cardiovascular Disease. Med. Sci. Monit. Int. Med. J. Exp. Clin. Res.
**2021**, 27, e931362. [Google Scholar] [CrossRef] - Lazureanu, P.C.; Popescu, F.G.; Stef, L.; Focsa, M.; Vaida, M.A.; Mihăilă, R. The Influence of Periodontal Disease on Oral Health Quality of Life in Patients with Cardiovascular Disease: A Cross-Sectional Observational Single-Center Study. Medicina
**2022**, 58, 584. [Google Scholar] [CrossRef] - Kaczmarek-Majer, K.; Casalino, G.; Castellano, G.; Dominiak, M.; Hryniewicz, O.; Kamińska, O.; Vessio, G.; Díaz-Rodríguez, N. PLENARY: Explaining black-box models in natural language through fuzzy linguistic summaries. Inf. Sci.
**2022**, 614, 374–399. [Google Scholar] [CrossRef]

**Figure 4.**Confusion matrices representing predictions vs. actuals on test data for each of the two prediction models (

**a**) H2O AutoML (XGBoost); (

**b**) Auto-sklearn (RF).

**Figure 9.**Joint plots for Metabolic syndrome and EQ-5D-5L score variables. (

**a**) Kernel density estimate plot; (

**b**) regression plot.

**Figure 11.**Waterfall plot for a healthy subject without metabolic syndrome. (

**a**) H2O AutoML (XGBoost), predicted probability = 0.31; (

**b**) Auto-sklearn (RF), predicted probability = 0.25.

**Figure 12.**SHAP force plot for a healthy subject, without metabolic syndrome. (

**a**) H2O AutoML (XGBoost), predicted probability = 0.31; (

**b**) Auto-sklearn (RF), predicted probability = 0.25.

**Figure 13.**Waterfall plot for a patient diagnosed with metabolic syndrome. (

**a**) H2O AutoML (XGBoost), predicted probability = 0.93; (

**b**) Auto-sklearn (RF), predicted probability = 0.99.

**Figure 14.**SHAP force plot for a patient diagnosed with metabolic syndrome. (

**a**) H2O AutoML (XGBoost), predicted probability = 0.93; (

**b**) Auto-sklearn (RF), predicted probability = 0.99.

**Figure 15.**Contribution of the CV risk feature variable in the predictions made. f(x) represents the probability of predicting the True class (the existence of metabolic syndrome) for instance x. The red color indicates a positive contribution to the prediction of metabolic syndrome. (

**a**) H2O AutoML (XGBoost); (

**b**) Auto-sklearn (RF).

Paper | Data Set | Classifiers * | Metabolic Syndrome | Periodontal Disease | Explainability of the Prediction |
---|---|---|---|---|---|

[45] | 18,553 patients from the Temple University Kornberg School of Dentistry predoctoral clinics | XGBoost | - | target | yes |

[46] | 1333 Taiwanese adult patients | DT | target | - | no |

[47] | Metabolic data set from Kaggle repository, 12,012 records | SVM, KNN, DT, RF, AdaBoost, GB, SGB, CatBoost, XGBoost | target | - | yes |

[48] | 67,730 patients, Nanfang Hospital, China | XGBoost | target | - | yes |

[49] | Tlalpan 2020 cohort study data set, Mexico City, 2289 subjects | RF, C4.5, DNN | target | - | no |

[50] | Internal validation cohort, 6793 participants External validation cohort, 7681 participants | ANN, CART, SVM | target | - | no |

[51] | KoGES cohort study, 3064 participants, Korea | KNN, Naïve Bayes, RF, DT, MLP, SVM | target | - | no |

[52] | 532 subjects, Toulouse University Hospital Centre, France | MLP | - | target | yes |

[53] | Internal validation, 3453 participants, Taiwan External validation, 3685 participants, United States | AdaBoost, ANN, DT, GP, KNN, SVC, LDA, RF, Naïve Bayes | - | target | no |

[54] | 173,209 adults aged 40 years or older, South Korea | LR, DT, RF, XGBoost, TN | target | - | yes |

[55] | 2258 individuals | GBM, XGBoost, RF | feature | target | yes |

[56] | 6421 Japanese individuals | MLR | target | feature | no |

[57] | DOME study, 132,529 subjects | LR, XGBoost | feature | feature | no |

[58] | 2401 samples From the NHANES database | LR, MLP, KNN, SVM, RF, XGBoost, Naïve Bayes | target | - | no |

[59] | 103 patients, Department of Internal Medicine, University of Catania, Italy | LR | target | feature | no |

[60] | 1011 participants, Brazil | LR | target | feature | no |

SHAP Wrapper for Auto-Sklearn Models | |
---|---|

Class | class SKLProbWrapper: def __init__(self, skl_model, feature_names): self.skl_model = skl_model self.feature_names = feature_names def predict_binary_prob(self, X): if isinstance(X, pd.Series): X = X.values.reshape(1,−1) self.dataframe= pd.DataFrame(X, columns=self.feature_names) self.predictions = self.skl_model.predict_proba(self.dataframe.values) return self.predictions.astype(‘float64’)[:,−1] #probability of True class |

Use case | skl_wrapper = SKLProbWrapper(model, dframe.columns) skl_explainer = shap.KernelExplainer(skl_wrapper.predict_binary_prob, dframe) shap_values = skl_explainer(dframe) |

Feature Variable | Recorded Values | Assigned Numerical Values | Count (Total = 296) | Percentage % |
---|---|---|---|---|

Gingival bleeding | no | 0 | 104 | 35.14 |

yes | 1 | 192 | 64.86 | |

Periodontal pockets depth | - | 0 | 96 | 32.43 |

≤3.5 | 1 | 68 | 22.97 | |

>3.5 | 2 | 96 | 32.43 | |

>5 | 3 | 36 | 12.17 | |

Carotid atherosclerosis | no | 0 | 158 | 53.38 |

yes | 1 | 138 | 46.62 | |

Dental control | no | 0 | 248 | 83.78 |

yes | 1 | 48 | 16.22 | |

Daily tooth brushing | irregular/occasional | 0 | 84 | 28.38 |

1 per day | 1 | 136 | 45.94 | |

2 per day | 2 | 76 | 25.68 |

$TN$ | (True Negative): the value of correct predictions of negatives out of actual negative cases. |

$TP$ | (True Positive): the value of correct predictions of positives out of actual positive cases. |

$FP$ | (False Positive): the value of incorrect positive predictions. |

$FN$ | (False Negative): the value of incorrect negative predictions. |

$Accuracy=\frac{TN+TP}{TN+FP+TP+FN}$ | $Precision=\frac{TP}{TP+FP}$ |

$Recall=\frac{TP}{TP+FN}$ | $F1\hspace{0.33em}Score=2\ast \frac{Precision\ast Recall}{Precision+Recall}$ |

$Specificity=\frac{TN}{TN+FP}$ | $Balanced\hspace{0.17em}accuracy=\frac{Recall+Specificity}{2}$ |

AutoML Framework | Algorithm | Model Parameters and Hyperparameters |
---|---|---|

H2O AutoML | XGBoost | number_of_trees: 47, max_depth: 10, min_rows: 5, min_child_weight: 5, learn_rate: 0.3, eta: 0.3, sample_rate: 0.6, normalize_type: tree, distribution: bernoulli, grow_policy: depthwise, dmatrix_type: dense, booster: gbtree |

H2O AutoML | DRF | number_of_trees: 32, number_of_internal_trees: 32, model_size_in_bytes: 5639, min_depth: 4, max_depth: 7, mean_depth: 5.75, min_leaves: 7, max_leaves: 15, mean_leaves: 9.375 |

H2O AutoML | GBM | number_of_trees: 628, number_of_internal_trees: 628, model_size_in_bytes: 115250, min_depth: 1, max_depth: 7, mean_depth: 4.915605, min_leaves: 2, max_leaves: 15, mean_leaves: 9.91242 |

Auto-sklearn | RF | bootstrap: True, criterion: ‘gini’, max_depth: ‘None’, max_features: 0.5, max_leaf_nodes: ‘None’, min_impurity_decrease: 0, min_samples_leaf: 1, min_samples_split: 2, min_weight_fraction_leaf: 0 |

Auto-sklearn | MLP | activation_function: relu, alpha: 0.02847755502162456, beta_1: 0.9, beta_2: 0.999, early_stopping: train, epsilon: 10^{−8}, hidden_layer_depth: 2, learning_rate_init: 0.000421568792103947, num_nodes_per_layer: 123, shuffle: True, solver: adam |

Auto-sklearn | ExtraTrees | bootstrap: False, criterion: entropy, max_features: 0.993803313878608, max_leaf_nodes: None, min_impurity_decrease: 0, min_samples_leaf: 2, min_samples_split: 20, min_weight_fraction_leaf: 0, |

AutoML Framework | Model | Precision | Recall | Accuracy | Specificity | Balanced Accuracy | F1 | Incorrect Classifications |
---|---|---|---|---|---|---|---|---|

H2O | XGBoost | 1 | 1 | 1 | 1 | 1 | 1 | 0 |

DRF | 1 | 0.885 | 0.932 | 1 | 0.942 | 0.939 | FN = 6 | |

GBM | 1 | 0.769 | 0.864 | 1 | 0.885 | 0.870 | FN = 12 | |

Auto-sklearn | RF | 0.929 | 1 | 0.955 | 0.889 | 0.944 | 0.963 | FP = 4 |

MLP | 0.897 | 1 | 0.932 | 0.833 | 0.917 | 0.945 | FP = 6 | |

ExtraTrees | 0.867 | 1 | 0.909 | 0.778 | 0.889 | 0.929 | FP = 8 |

Feature | Mean Value |
---|---|

DMFT | 21.954 |

CPI | 2.682 |

Periodontal pockets | 1.5 |

Bleeding | 0.636 * |

Tooth brushing | 0.818 |

Dental control | 0.136 * |

Gingival attachment loss | 2.454 |

CV risk | 7.636 |

Carotid atherosclerosis | 0.477 * |

EQ-5D-5L score | 0.935 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Boitor, O.; Stoica, F.; Mihăilă, R.; Stoica, L.F.; Stef, L.
Automated Machine Learning to Develop Predictive Models of Metabolic Syndrome in Patients with Periodontal Disease. *Diagnostics* **2023**, *13*, 3631.
https://doi.org/10.3390/diagnostics13243631

**AMA Style**

Boitor O, Stoica F, Mihăilă R, Stoica LF, Stef L.
Automated Machine Learning to Develop Predictive Models of Metabolic Syndrome in Patients with Periodontal Disease. *Diagnostics*. 2023; 13(24):3631.
https://doi.org/10.3390/diagnostics13243631

**Chicago/Turabian Style**

Boitor, Ovidiu, Florin Stoica, Romeo Mihăilă, Laura Florentina Stoica, and Laura Stef.
2023. "Automated Machine Learning to Develop Predictive Models of Metabolic Syndrome in Patients with Periodontal Disease" *Diagnostics* 13, no. 24: 3631.
https://doi.org/10.3390/diagnostics13243631