Lowering Barriers to Health Risk Assessments in Promoting Personalized Health Management

This study investigates the feasibility of accurately predicting adverse health events without relying on costly data acquisition methods, such as laboratory tests, in the era of shifting healthcare paradigms towards community-based health promotion and personalized preventive healthcare through individual health risk assessments (HRAs). We assessed the incremental predictive value of four categories of predictor variables—demographic, lifestyle and family history, personal health device, and laboratory data—organized by data acquisition costs in the prediction of the risks of mortality and five chronic diseases. Machine learning methodologies were employed to develop risk prediction models, assess their predictive performance, and determine feature importance. Using data from the National Sample Cohort of the Korean National Health Insurance Service (NHIS), which includes eligibility, medical check-up, healthcare utilization, and mortality data from 2002 to 2019, our study involved 425,148 NHIS members who underwent medical check-ups between 2009 and 2012. Models using demographic, lifestyle, family history, and personal health device data, with or without laboratory data, showed comparable performance. A feature importance analysis in models excluding laboratory data highlighted modifiable lifestyle factors, which are a superior set of variables for developing health guidelines. Our findings support the practicality of precise HRAs using demographic, lifestyle, family history, and personal health device data. This approach addresses HRA barriers, particularly for healthy individuals, by eliminating the need for costly and inconvenient laboratory data collection, advancing accessible preventive health management strategies.


Introduction
Recent advancements in biomedicine and information technology have catalyzed a paradigm shift in healthcare, moving from treating the sick in healthcare facilities to preventing illness in healthy individuals through personalized health management in communities, a concept central to P4 (predictive, preventive, personalized, and participatory) medicine [1].This approach, emphasizing prediction, prevention, personalization, and participation, aims to preemptively identify disease susceptibility and prevent progression through tailored healthcare interventions [2].The success of P4 medicine increasingly relies on precise health risk assessments (HRAs), leveraging data science, wearable technology, and the Internet of Things (IoT) to predict individual health risks and potential mortality [2,3].
Originally developed in the late 1940s and evolving significantly since the mid-2000s, the applications of HRAs have transitioned from clinical settings to community health promotion programs [4][5][6][7][8][9], involving questionnaires on demographic details, lifestyle factors, medical history, and physiological data to gauge individual health risks [10,11].Nonetheless, evidence substantiating the predictive accuracy of HRA instruments has remained limited, a situation largely attributable to the scarcity of data linking assessment inputs to health outcomes over extended time frames with regard to issues of data linkage, not to mention the imperative need for cost-effectiveness to rationalize data collection efforts vis-à-vis prediction precision, thereby expanding the instruments' utility [12].
HRA constitutes a systematic process involving the evaluation of an individual's health risks based on factors including lifestyle, medical history, and biomarkers [13].However, the challenge of obtaining these data, especially from healthy individuals, is significant, as evidenced by low participation rates in wellness programs, like the 24% participation in the Annual Wellness Visit by Medicare fee-for-service beneficiaries in 2017 [14].
The objective of this study was to investigate the feasibility of conducting HRAs without relying on high-cost data such as laboratory tests, which often necessitate visits to healthcare facilities.By leveraging machine learning methods, the predictive performance of HRA models with and without laboratory data was compared and the feature importance of the models was analyzed to gain insights useful for developing personalized health management guidelines.The study results indicated that the predictive performances of the models utilizing demographic, lifestyle, family history, and personal health device data, with or without laboratory data, were comparable.Moreover, the models without laboratory data identified important features that were more valuable in developing health guidelines, thus emphasizing modifiable lifestyle factors.These findings could facilitate easier access to personalized health management for healthy individuals, thereby supporting the broader implementation of P4 medicine.
The NHIS administers biennial medical check-ups for beneficiaries aged 40 and above through the National Health Screening Program (NHSP).This program also includes younger blue-collar workers and household heads.Those in high-risk work environments are eligible for annual check-ups.The NHSP involves laboratory tests and self-reported health behavior and medical history questionnaires.
The insurance claims dataset, processed by the Health Insurance Review and Assessment Service (HIRA), includes details on patient identification, provider information, service descriptions, diagnoses (ICD-10 codes), and total charges.The death registry dataset, sourced from Statistics Korea [16], records the date and cause of death.We excluded deaths due to external causes such as accidents or suicides from our study, in line with Kwon et al.'s criteria [17].
Our initial dataset comprised 489,461 records from the cohort that had medical checkups conducted between 2009 and 2012.After applying various exclusion criteria, the final dataset included 425,148 records.Exclusions were made for individuals under 30 years old (as the NHSP primarily targets adults over 40), records with character values in birth year fields, missing data, and records with extreme values indicating probable typographical errors.Figure 1 illustrates the schematic diagram of the study dataset.
old (as the NHSP primarily targets adults over 40), records with character values in birth year fields, missing data, and records with extreme values indicating probable typographical errors.Figure 1 illustrates the schematic diagram of the study dataset.

Variables
In this study, we evaluated the health risk of individuals by quantifying the likelihood of future adverse health events, such as mortality and chronic diseases, within a predefined time frame.We accomplished this by utilizing machine learning models to predict the incidence of these events.
We categorized predictor variables into five distinct groups: demographic variables (DEMO), lifestyle variables encompassing health behaviors and body measurements (LS), family history variables (FH), personal health device variables (PHD), and laboratory variables (LAB).In Table 1, we present the predictor variables' definitions, notations and descriptions, as well as the descriptive statistics and frequency distributions for both male and female datasets.

Category
Variable Definition Mean ± STD/Freq % p-Value

Variables
In this study, we evaluated the health risk of individuals by quantifying the likelihood of future adverse health events, such as mortality and chronic diseases, within a predefined time frame.We accomplished this by utilizing machine learning models to predict the incidence of these events.
We categorized predictor variables into five distinct groups: demographic variables (DEMO), lifestyle variables encompassing health behaviors and body measurements (LS), family history variables (FH), personal health device variables (PHD), and laboratory variables (LAB).In Table 1, we present the predictor variables' definitions, notations and descriptions, as well as the descriptive statistics and frequency distributions for both male and female datasets.
In recent studies, lifestyle variables, which hold the potential guiding personal lifestyle interventions to prevent or treat adverse health events, encompass multiple interconnected aspects such as body weight, body mass index (BMI), and waist circumference [18,19].Previous studies addressing the global burden of disease have considered smoking, alcohol intake, and substance use as behavioral risk factors in efforts to mitigate health-related losses [20].Our study incorporates lifestyle variables, including body measurements and health behavior variables; the former are measured during medical check-ups and the latter are derived from self-reported survey responses to NHIS-NSC questionnaires administered during medical check-ups.We defined health behavior variables across three domains: smoking, alcohol intake, and physical activity; all of these directly influence an individual's health status [21].Smoking amount (SMK) is quantified as the cumulative amount of smoking undertaken over an individual's lifetime in pack-years using Equation (1) [22].Alcohol intake (DRK) is calculated as the amount of alcohol consumped weekly in bottles using Equation (2) [23].Physical activity (PA) is computed based on parameters from NHIS-NSC questionnaires, considering light activity, moderate activity, and vigorous activity, and converting them into metabolic equivalents (METs) using Equation (3) [24].SMK (Pack-year) = # of cigarettes a day × 0.05 × # of years smoked (1) DRK (Bottle/week) = Mean alcohol intake a day (g) × 0.02 × # of days drank a week (2) PA (Metabolic equivalents) = # of light activity days a week × 2.9 × 30 + # of moderate activity days a week × 4 × 30 + # of vigorous activity days a week × 7 × 20 (3) Family history variables are represented as (0, 1) indicator variables across four areas of adverse health events: heart disease, stroke, hypertension, and diabetes.These variables are computed using self-reported survey responses obtained during medical check-ups.
With the proliferation of technology and the increased accessibility of medical wearable devices, a growing reservoir of clinical data is now available outside traditional clinical settings.Such data are employed by both patients and healthy individuals to manage their health from the comfort of their homes.We refer to this subset of variables as personal health device variables (PHD), and blood pressure (BP) and fasting blood sugar (FBS) were included in this study.Our study utilizes medical check-up data to compute PHD variables.
On the other hand, we define a category for laboratory variables (LAB), encompassing measurements obtained from blood and urine samples analyzed in clinical laboratories.This includes biomarkers such as cholesterol, aspartate aminotransferase (AST), and hemoglobin (HGB).Data from medical check-ups were employed to calculate the LAB variables used in our study.
Our study focused on predicting mortality and the incidence of five major chronic diseases: heart disease, stroke, cancer, hypertension, and diabetes.These diseases are prominent contributors to global morbidity, disability, and mortality, and pose substantial individual and socioeconomic burdens due to their prolonged management and associated costs [25,26].By predicting these adverse health events, our models aim to facilitate early detection and personalized risk management, thereby improving public health outcomes and healthcare system cost-effectiveness [27].
The study dataset, compiled from medical check-up data (2009-2012) and claims data (2002-2019), was analyzed to identify these adverse health events.Our approach involved assessing three prediction timeframes, namely three, five, and ten years, starting from the year following the medical check-up.This analysis prioritized data free from recorded health issues up to the year of the check-up, excluding records of adverse health events in or before the check-up year and those indicating mortality within the prediction timeframe.
Chronic disease incidences were determined based on the ICD-10 diagnosis codes present in the claims data and laboratory test outcomes obtained from medical check-ups.Heart disease was identified when ICD-10 codes I20-I25 were recorded as a principal or a secondary diagnosis in the claims data, and ICD-10 codes I60-I69 were associated with stroke, as used in prior studies [28][29][30][31].Similarly, cancer incidences were detected based on principal or secondary diagnoses in the claims data, focusing on the five most common cancer types by gender [32]: lung (C33, C34), gastric (C16), colorectal (C18, C19, C20), prostate (C61), and liver (C22) cancer for male, and breast (C50, D05), colorectal (C18, C19, C20), gastric (C16), lung (C33, C34), and liver (C22) cancer for female.Thyroid cancer was excluded from the list of adverse health events in this study because the five-year survival rate in Korea is over 99% [33].The incidences of hypertension and diabetes were established when BP ≥ 140/90 mmHg was recorded in medical check-ups or ICD-10 codes I10-I15 were recorded as a principal or a secondary diagnosis in the claims data during the data search period, and when diabetes with fasting glucose ≥ 126 mg/dL was recorded in the medical check-ups or when ICD-10 codes R81, E10-E14 were recorded as a principal or a secondary diagnosis in the claims data.Details of the number of records and the prevalence of adverse health events in the male and female datasets are presented in Table 2.

Analytical Models
We designed this study to evaluate whether including predictor variables with higher acquisition costs improves the predictive accuracy of HRAs for the personalized prediction and prevention of adverse health events.We systematically introduced groups of variables one at a time (Models 1-4 in Figure 1) to assess the incremental predictive accuracy gained by adding the groups of variables to the models.Conceptually, the data acquisition costs reflect the financial and logistical burden associated with obtaining the data, as well as the discomfort and inconvenience experienced by individuals during the acquisition process.We posited that acquiring laboratory data would be the most resource-intensive and cumbersome process due to the need for individuals to undergo procedures involving needles and blood extraction [34].Considering the significantly different characteristics between male and female datasets (Tables 1 and 2), we conducted separate analyses for each gender.
A comparative analysis of the models enabled us to examine the incremental predictive accuracy introduced by each group of variables.The models were trained on 70% of the dataset and tested on the remaining 30%.The evaluation metrics included the area under the curve (AUC), accuracy, and F1-score [35].We utilized Youden's J statistic to determine the optimal threshold for maximizing the accuracy and F1-score.The significance levels of the AUC differences for each model were assessed using DeLong's method [36,37].
Our primary analytical tool was the XGBoost model, known for its exceptional predictive capabilities [38][39][40][41][42].To validate the XGBoost results, we also applied logistic regression with stepwise variable selection.Hyperparameter optimization was performed using the grid search method [43][44][45], and multiple hyperparameter combinations were evaluated to compare their predictive performance [46].We enhanced this process through 10-fold cross-validation.To evaluate the importance of each predictor variable, we conducted a gain analysis using XGBoost's feature importance algorithm.All computations were performed in R version 4.3.0.

Results
Table 1 presents the categories and definitions of the predictor variables, along with the descriptive statistics for continuous variables and the frequency distributions for binary categorical variables, for both males and females.All differences in the statistics between the male and female data were statistically significant at a significance level of α = 1%.The average age for male records was 48.8, while for female records, it was 51.7.Table 2 presents the number of records used in the prediction models and the prevalence of adverse health events for each prediction timeframe, with the significance levels of the differences in prevalence between the male and female data.The lowest prevalence was observed for cancer in the three-year prediction period (1.09% for males and 0.62% for females), while the highest prevalence was noticed for hypertension in the ten-year prediction period (28.02% for males and 23.57% for females).

Incremental Predictive Performance Achieved by the Inclusion of Groups of Predictor Variables
Our study presents an in-depth analysis of the predictive efficacy of four models (Models 1-4), as detailed in Figure 2 and Table A1 in Appendix A. We evaluated these models based on their area under the curve (AUC), accuracy, and F1-score in the testing datasets.To assess the impact of incorporating different groups of predictor variables, we measured changes in the model performance before and after their addition.DeLong's test results for the significance of AUC differences are provided in  Upon adding LS and FH variables to Model 2, the AUC values improved for most adverse health event predictions, especially for hypertension and diabetes, demonstrating the value of these variables in enhancing the prediction accuracy.Model 3, which incorporated PHD variables alongside DEMO, LS, and FH variables, showed an increase in AUC values across most predictions, with marked improvements in hypertension and diabetes predictions for both genders.However, the transition from Model 3 to Model 4, involving the addition of LAB variables, resulted in relatively modest improvements in AUC values, with limited gains in accuracy and F1-scores.These findings suggest that the inclusion of LAB variables, despite their high acquisition cost, contributed only marginally to the overall predictive performance for most health risks.The AUC, a measure of a model's ability to distinguish between records with and without adverse health event incidences, showed a range of 0.623 (three-year hypertension prediction for males, Model 1) to 0.897 (five-year mortality prediction for males, Model 4).Models 3 and 4 consistently achieved AUCs above 0.7 in all predictions except for female cancer predictions.Accuracy, representing the percentage of correct predictions, varied notably from 0.482 (ten-year cancer prediction for females, Model 4) to 0.830 (ten-year mortality prediction for males, Model 4).The F1-score, indicating the balance between precision and recall, ranged from a low of 0.020 (three-year cancer prediction for females, Model 1) to a high of 0.533 (ten-year hypertension prediction for males, Model 4).An interesting pattern observed in Figure 2 is the improvement in F1-scores with longer prediction timeframes, especially evident in the hypertension and diabetes predictions, as opposed to mortality and cancer.
Upon adding LS and FH variables to Model 2, the AUC values improved for most adverse health event predictions, especially for hypertension and diabetes, demonstrating the value of these variables in enhancing the prediction accuracy.Model 3, which incorporated PHD variables alongside DEMO, LS, and FH variables, showed an increase in AUC values across most predictions, with marked improvements in hypertension and diabetes predictions for both genders.However, the transition from Model 3 to Model 4, involving the addition of LAB variables, resulted in relatively modest improvements in AUC values, with limited gains in accuracy and F1-scores.These findings suggest that the inclusion of LAB variables, despite their high acquisition cost, contributed only marginally to the overall predictive performance for most health risks.
In summary, our analysis demonstrates that while the addition of LS, FH, and PHD variables significantly enhanced the predictive efficacy of our models, the incremental gain from incorporating LAB variables was limited, indicating a nuanced balance between data acquisition costs and the predictive performance in health risk assessments.

Feature Importance
Table 3 presents the top five significant features in our prediction models, highlighting the proportion of variance each feature explained.This analysis is crucial for developing personalized health promotion guidelines based on individual HRA results.Notably, we focused on the impact of introducing laboratory variables by comparing the key features in Model 3 (without LAB variables) and Model 4 (with LAB variables).
AGE consistently emerges as a dominant feature in most predictions.However, other variables such as SBP and FBS were more significant in predicting hypertension and diabetes, respectively.Body measurements like WC and BMI were prominent predictors for heart disease and stroke, while WT was significant for mortality.Health behavior variables like SMK (for heart disease), PA (for females in Model 2), and DRK (for three-year female cancer) were notable predictors for certain health risks.LAB variables showed varying levels of significance, with their overall contribution to predictive performance being limited when combined with other variable groups.Family history variables consistently appeared among the top five features predicting various health risks, albeit with lower rankings.

Discussion
In our study, we categorized the predictor variables of Health Risk Assessment (HRA) models into four tiers based on acquisition costs: demographic (DEMO), lifestyle (LS) and family history (FH), personal health device (PHD), and laboratory (LAB) variables.This categorization enabled us to evaluate the incremental predictive performance of each tier, balancing data acquisition costs against the predictive effectiveness of HRA models.
Our results demonstrated that Model 3, incorporating DEMO, LS, FH, and PHD variables, had a predictive performance comparable to Model 4, which added LAB variables, across various adverse health events and prediction timeframes.Interestingly, even Model 2, which included only DEMO, LS, and FH variables, performed effectively in most predictions.However, the accuracy measures, especially for stroke predictions in females, tended to decrease with the addition of PHD and LAB variables.These findings suggest that excluding costly and inconvenient LAB variables from HRAs does not significantly impair the predictive efficacy, potentially enhancing the accessibility and widespread adoption of personalized healthcare.
Feature importance analyses reinforced the well-established connections between health behavior and outcomes [20,47,48].Notably, the significant features from Models 2 and 3, encompassing modifiable factors like WC and BMI, provided valuable insights for health guideline development compared to those from Model 4. The associations between PHD variables (SBP, DBP, FBS) and LS variables imply that lifestyle changes can influence PHD variables.
All the models in our study predicted the incidences fairly accurately, with varying degrees of accuracy depending on the type of incidence and prediction timeframe.These findings highlight the effectiveness of our assessment models in formulating personalized health promotion strategies.Notably, the first and third-ranked diseases that incurred the highest expenditures for the National Health Insurance Service of Korea in 2021 were hypertension and type 2 diabetes [49].While the predictive performance of the models for heart diseases, stroke, and cancer can be deemed decent, further improvements may be necessary depending on the assessment's purpose.
Additionally, these study findings bear significant implications in the era of technological advancements that enable individuals to access their health data through personal health devices without visiting healthcare facilities [50].As the availability and reliability of personal health device data continue to improve, the depth of person-generated health data will increase, offering detailed and continuous information [51][52][53].Our study has shown that increasing the accessibility of health data from personal health devices can be a key factor in HRA, potentially replacing data from clinical settings and expanding the market potential of HRAs.
We conducted an examination of three evaluation measures, namely AUC, accuracy, and F1-score, in the context of risk predictions for six adverse health events across three different prediction timeframes.While our overall findings align with the major trends observed, we did encounter irregular results for a few specific target risks and evaluation measures, particularly the accuracy and F1-scores.These findings underscore the importance of making judicious selections when choosing an appropriate measure in the evaluation of HRA models, depending on the specific target event to be predicted and the intended use of the assessment results.For instance, F1-scores are designed to address issues related to measuring the predictive performance in imbalanced data scenarios, such as the incidences of mortality and cancer.Additionally, when the cost associated with a false negative (missing the incidence of events in the prediction) outweighs the cost of a false positive (predicting negatives as positives), it becomes evident that sensitivity and specificity should not be given equal weight in the evaluation.
This study has limitations.Firstly, the predominance of NHSP participants over 40 years old in our database may limit its generalizability across different age groups and countries.Secondly, there may be an underestimation of disease prevalence, particularly for diseases in which individuals had a disease but were insensitive to symptoms and did not seek care at healthcare facilities.This is particularly relevant for diseases such as diabetes and hypertension, which are known to affect a large proportion of individuals who are unaware of their condition [54][55][56].Thirdly, the reliance on self-reported questionnaire data for lifestyle and family history variables introduces the potential for omission and recall errors [57][58][59].We anticipate that this limitation will be addressed as the accessibility and accuracy of wearable and IoT data continue to improve [60,61].Lastly, we employed body measurements and personal health device data collected in clinical settings for our analyses and assumed their equivalence when measured in non-clinical settings.We expect this assumption would not impact the study's overall implications, as our focus was on future risk assessment rather than immediate disease diagnosis.
Future research should explore the potential of wearables and IoT data beyond blood pressure and blood sugar measurements in HRAs.These sources of data offer accuracy, automatic reporting, non-invasiveness, and continuous monitoring.Leveraging such high-quality data has the potential to significantly enhance HRAs and contribute to more effective lifestyle modification and health promotion efforts.

Conclusions
This study aimed to assess the incremental predictive performance of four tiers of predictor variables-demographic, lifestyle and family history, personal health device, and laboratory-in predicting mortality and five chronic diseases across different timeframes.Our primary goal was to strike a balance between data acquisition costs and prediction accuracy to facilitate the widespread implementation of personalized health promotion strategies through HRAs.Our research yields three significant contributions.
Firstly, our findings indicate that the addition of laboratory variables beyond demographic, lifestyle, family history, and personal health device variables did not significantly improve model performance across all examined health events.This insight suggests that removing the need for costly and inconvenient laboratory data acquisition could lower barriers to HRAs, especially for healthy individuals, thereby enhancing accessibility to personalized health promotion.
Secondly, the models incorporating lifestyle and family history variables alongside demographic variables demonstrated comparable performance to full models when assessing the risk of heart diseases, stroke, and cancer.Notably, for certain assessments like cancer in females, the inclusion of further variables resulted in decreased accuracy and F1-scores.This underscores the reliability of Model 2 when aiming to perform accurate risk assessments for these health events.
Lastly, our analysis of important features from Models 2 and 3, which include modifiable body measurements and health behavior variables, suggests their suitability for designing health guidelines compared to models incorporating laboratory data.This implies that guidance from Models 2 and 3 is more relevant and practical for health practitioners and policymakers in shaping effective personalized health management strategies.
In conclusion, our study's findings offer valuable insights for healthcare practitioners and policymakers, aiding in the formulation of personalized health promotion strategies without significant data acquisition costs.By extending these strategies to a broader population, including those with limited access to healthcare facilities, we can foster a new era of more accessible and effective personalized health management.As we continue to navigate the delicate balance between data acquisition costs and prediction performance, we anticipate further advancements in personalized healthcare.

Figure 1 .
Figure 1.The schematic diagram of the study dataset and analytical models.

Figure 1 .
Figure 1.The schematic diagram of the study dataset and analytical models.

Table 1 .
Definition of predictor variables with descriptive statistics/frequency distribution for male and female datasets (N = 425,148).

Table 1 .
Definition of predictor variables with descriptive statistics/frequency distribution for male and female datasets (N = 425,148).

Table 2 .
Number of records (n) and prevalence (%) of adverse health events in the prediction models.

Table A1 ,
with logistic regression results for comparison in TableA2in Appendix A.

Table 3 .
The top five important features and the proportion of model variance explained by each feature.

Institutional Review Board Statement:
Institutional Review Board (IRB) exemption for this study was granted by Seoul National University Bundang Hospital (No. X-2201-7732-902) on 30 December 2021, as the data used were anonymized.Informed consent is not applicable to this study as it utilized the National Health Insurance Service (NHIS)-National Sample Cohort (NSC) data, a population-based cohort, which was constructed by the NHIS of Korea using information from insurance eligibility, medical check-ups, insurance claims, and death registry data.It was anonymized and made available to researchers upon request and approval.