Next Article in Journal
Mechanical and Hygrothermal Evaluation of Eco-Friendly Cement Mortar Incorporating Upcycled Low-Density Polyethylene
Previous Article in Journal
Integrated Borehole Interpretation and BIM-Based Three-Dimensional Geological Modeling for Gas Control in Underground Coal Mining
Previous Article in Special Issue
Deep Learning Based Correction Algorithms for 3D Medical Reconstruction in Computed Tomography and Macroscopic Imaging
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Extracting Behavioral Rules from Health Survey Data with Interpretable Models

Faculty of Exact and Technical Sciences, University of Rzeszów, Al. Rejtana 16C, 35-959 Rzeszów, Poland
Appl. Sci. 2026, 16(12), 6146; https://doi.org/10.3390/app16126146
Submission received: 8 April 2026 / Revised: 4 June 2026 / Accepted: 6 June 2026 / Published: 17 June 2026
(This article belongs to the Special Issue Engineering Applications of Hybrid Artificial Intelligence Tools)

Abstract

This study investigates the use of interpretable machine learning techniques to identify behavioral and demographic patterns associated with diabetes, based on structured population survey data from the Canadian Community Health Survey (CCHS). A decision tree classifier was applied to a dataset comprising 16,824 respondents and 38 preprocessed features covering lifestyle, well-being, and sociodemographic factors. The model was optimized through grid search with five-fold stratified cross-validation, achieving a test accuracy of 61.3% (mean 62.6% ±0.6% across a 10×5 repeated stratified cross-validation). Feature importance analysis revealed that age, alcohol consumption patterns, daily energy expenditure, and physical activity were the most influential factors associated with diabetes status, with the top three features exhibiting stable importance across all cross-validation folds. The model produced a set of 32 human-readable decision rules; a sensitivity analysis confirmed that these rules are stable across encoding choices and cross-validation folds. Several model variants were evaluated: a class-weighted decision tree, a logistic regression baseline, an age-only decision tree, and an age and sex logistic regression. The class-weighted model improved minority-class recall (from 0.25 to 0.53) at the cost of overall accuracy. A one-hot encoding sensitivity analysis showed that replacing ordinal label encoding of nominal variables with one-hot encoding produces virtually identical results (accuracy: 61.4% vs. 61.3%), confirming that the main rules are not artifacts of the encoding choice. Although the classification accuracy is moderate and not significantly better than a majority-class baseline (McNemar’s test, p=0.455), the extracted rules confirmed several known associations and revealed interactions between social and lifestyle variables. These rules are intended as hypothesis-generating population-level descriptors rather than validated clinical decision tools, and no causal inference is claimed. This approach demonstrates the value of rule-based models for exploratory public health research.
Keywords: medical data exploration; diabetes; health informatics; decision trees; machine learning in healthcare; interpretable machine learning medical data exploration; diabetes; health informatics; decision trees; machine learning in healthcare; interpretable machine learning

Share and Cite

MDPI and ACS Style

Lasek, P. Extracting Behavioral Rules from Health Survey Data with Interpretable Models. Appl. Sci. 2026, 16, 6146. https://doi.org/10.3390/app16126146

AMA Style

Lasek P. Extracting Behavioral Rules from Health Survey Data with Interpretable Models. Applied Sciences. 2026; 16(12):6146. https://doi.org/10.3390/app16126146

Chicago/Turabian Style

Lasek, Piotr. 2026. "Extracting Behavioral Rules from Health Survey Data with Interpretable Models" Applied Sciences 16, no. 12: 6146. https://doi.org/10.3390/app16126146

APA Style

Lasek, P. (2026). Extracting Behavioral Rules from Health Survey Data with Interpretable Models. Applied Sciences, 16(12), 6146. https://doi.org/10.3390/app16126146

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop