Developing a Health Risk Evaluation Method for Triple H

The development of a health evaluation system from human-related data is an important issue in preventive medicine. Previously, most studies have focused on disease assessment and prevention in patients. However, even if certain risk factors are all within normal ranges, individuals may not necessarily be completely healthy. This study focused on healthy individuals to develop a new index to assess health risks; this index can be used for the prevention of multiple diseases in healthy people. The kernel density technique was proposed to estimate the distribution of common risk factors and to develop a health risk index. A dataset of hypertension, hyperlipidemia, and hyperglycemia (Triple H) data from the National Health Insurance Research Database in Taiwan was used to demonstrate the proposed analytical process. The results of risk factor changes after six weeks of exercise were used to calculate the health risk index. The results showed that the subjects experienced a 7.29% reduction in their health risk index after the exercise intervention. This finding demonstrates the potential impact of an important reference index on quantifying the effect of maintenance in healthy people.


Introduction
For a long time, the diagnosis and treatment of diseases have been critical aspects of medical development. Many scholars have used data mining techniques on medical data to analyze the relationships between disorders and the real causes of those disorders [1][2][3][4]. Additionally, intelligent medical systems have been developed to help doctors diagnose illnesses [5][6][7]. However, abnormalities in physiological indicators may be a gauge of not only one disease but of multiple diseases. Therefore, in recent years, determining the common risk factors and developing a predictor model for multiple diseases have become more important. Chang et al. [8] proposed a two-stage analysis procedure that used data mining techniques and mathematical approaches to determine the common risk factors (such as systolic blood pressure (SBP), triglycerides (TGs), uric acid, glutamate pyruvate transaminase, and gender) and predictive models for hypertension and hyperlipidemia. Medical decision systems, based on analyzing risk factors and predicting the functions of diseases, can help patients understand the risks of developing diseases and efficiently provide diagnostic references for medical personnel. In the current era of medicine, preventive medicine has gradually become more accepted. The motivation of this study was to propose a new health risk index to assess the health status of people from the perspective of preventive medicine. The rapid development of medical devices has made it easy to Int. J. Environ. Res. Public Health 2019, 16, 1168 2 of 12 obtain information about physiological indicators. In the past, when all physiological indicators were within the standard range, the probability of illness was small and the patient was likely healthy. However, such information does not help the concept of preventive medicine. Therefore, a novel index must be developed to assess health status when there are changes in risk factors in healthy people.
Certain health care practices have also recently become more accepted; for example, people have started to use exercise, meditation, and diet to control risk factors. Whelton et al. [9] studied the effect of aerobic exercise on blood pressure and found that SBP and diastolic blood pressure (DBP) decreased by an average of 3.84 and 2.58 mmHg, respectively, over a period of aerobic training in 2419 subjects aged between 21 and 79 years old. Additionally, Fagard [10] designed a one-year experiment to observe the effects of exercise and diet on blood pressure; SBP and DBP decreased by an average of 3.4 and 2.4 mmHg, respectively, in 68 subjects after the implementation of an exercise regimen. Fagard [10] also found that exercise combined with dietary control resulted in a more significant effect than only exercise or only dietary control. Stewart et al. [11] designed an exercise plan for 115 subjects aged between 55 and 75 years old who were divided into two groups: a control group and a group that exercised three times a week. The results demonstrated that exercise could improve physical fitness and decrease the risk of many physiological factors related to cardiovascular diseases and diabetes. Based on the above discussion, exercise maintenance can reduce the values of several risk factors.
However, there has been little research on a method to evaluate the health status of a healthy individual when their risk factors change even after exercise maintenance. For example, hypertension patients are defined by an SBP of >140 mmHg or a DBP of >90 mmHg. When the SBP ranges between 120 and 135 mmHg, which is within the normal range, the resulting health effects may differ in different individuals. Vasan et al. found that in 9845 subjects with normal blood pressure, only 5.3% became hypertensive after four years; however, 17.6% of the subjects who developed hypertension originally had an SBP between 120 and 129 mmHg and a DBP between 80 and 84 mmHg [12]. The American Diabetes Association defines diabetes as a fasting blood glucose level of >126 mg/dL. The normal fasting blood glucose level for non-diabetics should range between 70 and 100 mg/dL. Although a fasting blood glucose level of <100 mg/dL is normal, a fasting blood glucose level between 100 and 125 mg/dL is considered pre-diabetic. Nichols et al. found that 8.1% of patients with fasting glucose levels between 100 and 109 mg/dL and 24.3% of those with levels between 110 and 125 mg/dL developed diabetes after an average of 41.4 months [13]. Therefore, risk factor values that are closer to the threshold will increase the risk of developing the respective disease. From the perspective of preventive medicine, understanding the degenerating state of healthy individuals will help prevent future disease.
Maintainability is widely used in industrial applications. Maintainability can be used to assess the life of a machinery or production system. The physical structures of the human body contain similar production systems; thus, an approach transferring the concepts of maintainability to human health assessment is the focus of this study. In practical applications, information on risk factors (such as blood pressure, heart rate, Electroencephalography (EEG), and Electromyography (EMG) can be collected quickly with the widespread use of home medical equipment. Therefore, in this study, a kernel density technique was used to develop a health risk index based on risk factors from human-related data. In practice, changes in health risk indicators can be used to establish an early warning mechanism. When the health risk indicator is gradually declining, the patient will still be considered healthy; however, if they fail to control the risk factor, they will likely develop the disease. Thus, a health risk index can estimate the health status of people and help them better understand their health conditions. Additionally, manufacturers should apply new techniques [14][15][16] to develop rapid physiological devices or sensors to measure most risk factors. For example, Zhang et al. presents state-of-the-art research progress on cardiovascular health informatics and focuses on three major challenges: unobtrusive and wearable multi-parameter sensors, fast multimodal imaging technologies, and novel multi-scale information fusion heart models [17].
In this paper, the health risk index was proposed to evaluate a person's level of health. Figure 1 displays the framework of this study. The differences from previous methods are primarily a result of different study purposes. Previous research has developed disease prediction models through algorithmic designs by collecting information about specific diseases in healthy and unhealthy populations. Those algorithms use feature selection techniques to identify risks and establish a classification model by collecting data on risk factors. Their purpose is to determine whether a person is healthy or sick by measuring risk factors. Conversely, this study focuses on the assessment of healthy people. The similarity lies in the determination of risk factors, whereas the difference lies in the development of health risk indicators based on the data of healthy people. The purpose of this study was to assess health outcomes, particularly in high-risk groups. When health risk indicators decrease, individuals may develop diseases or otherwise become unhealthy. At this point, risk factors must be controlled to avoid disease. The results of this study can be used to jointly develop AI in healthcare through cloud computing to evaluate the trends and changes of users' health. This information is important for disease prevention. In this paper, the health risk index was proposed to evaluate a person's level of health. Figure 1 displays the framework of this study. The differences from previous methods are primarily a result of different study purposes. Previous research has developed disease prediction models through algorithmic designs by collecting information about specific diseases in healthy and unhealthy populations. Those algorithms use feature selection techniques to identify risks and establish a classification model by collecting data on risk factors. Their purpose is to determine whether a person is healthy or sick by measuring risk factors. Conversely, this study focuses on the assessment of healthy people. The similarity lies in the determination of risk factors, whereas the difference lies in the development of health risk indicators based on the data of healthy people. The purpose of this study was to assess health outcomes, particularly in high-risk groups. When health risk indicators decrease, individuals may develop diseases or otherwise become unhealthy. At this point, risk factors must be controlled to avoid disease. The results of this study can be used to jointly develop AI in healthcare through cloud computing to evaluate the trends and changes of users' health. This information is important for disease prevention.

Materials and Methods
This study proposed a novel three-stage analysis procedure involving the feature selection method, the kernel density estimation method, and mathematical approaches to calculate the health risk index of healthy people. Stage 1 adopted the findings from our previous study [8]. The procedure used six classification techniques to individually screen for the key risk factors for multiple diseases. Based on the results of previous research [8], we used five common risk factors: fasting plasma glucose (FPG), total cholesterol (T-CHO), TGs, SBP, and DBP, to determine the risks for hypertension, hyperlipidemia, and hyperglycemia. After identifying these risk factors, we compared the risk factors for each disease to determine the common risk factors for multiple diseases. Stage 2 used the kernel density estimation method to fit density curves for the common risk factors individually. Finally, stage 3 calculated the health risk index based on the kernel density function of the common risk factors. The two primary methods of kernel density estimation and the calculation of the health risk index are described below. The proposed methodology is explained below using data on 6496 subjects (3104 males and 3392 females) with Triple H disease from the National Health Insurance Research Database in Taiwan and the research results of Stewart et al. [11], who conducted a six-week exercise intervention. The reason for applying Stewart et al. [11] research data is that the five risk factors proposed are the same as the five common risk factors in this study. Also, the effectiveness of preventive maintenance was evaluated using a health risk curve.

Materials and Methods
This study proposed a novel three-stage analysis procedure involving the feature selection method, the kernel density estimation method, and mathematical approaches to calculate the health risk index of healthy people. Stage 1 adopted the findings from our previous study [8]. The procedure used six classification techniques to individually screen for the key risk factors for multiple diseases. Based on the results of previous research [8], we used five common risk factors: fasting plasma glucose (FPG), total cholesterol (T-CHO), TGs, SBP, and DBP, to determine the risks for hypertension, hyperlipidemia, and hyperglycemia. After identifying these risk factors, we compared the risk factors for each disease to determine the common risk factors for multiple diseases. Stage 2 used the kernel density estimation method to fit density curves for the common risk factors individually. Finally, stage 3 calculated the health risk index based on the kernel density function of the common risk factors. The two primary methods of kernel density estimation and the calculation of the health risk index are described below. The proposed methodology is explained below using data on 6496 subjects (3104 males and 3392 females) with Triple H disease from the National Health Insurance Research Database in Taiwan and the research results of Stewart et al. [11], who conducted a six-week exercise intervention. The reason for applying Stewart et al. [11] research data is that the five risk factors proposed are the same as the five common risk factors in this study. Also, the effectiveness of preventive maintenance was evaluated using a health risk curve.

Kernel Density Estimation Method
The kernel density estimation approach was first described by Rosenblatt [17] and Parzen [18]. It is a nonparametric statistical method used to estimate an unknown probability distribution. The method does not require a priori knowledge or make any additional assumptions regarding data distribution. In practice, it is often assumed that the values of the risk factor follow a normal distribution; however, this assumption lacks strong supporting evidence. Therefore, in this study, we adopted the kernel density method to estimate the risk factor distribution.
If x 1 , x 2 , . . . , x n are independent and identically distributed unknown observations, the probability density functionf h can be estimated by the kernel density function as follows: where K(·) is a kernel function that was symmetric and integrated into one. The h variable is the bandwidth to determine the degree of smoothness of the kernel function. This study evaluated six types of common kernel functions, including Gaussian, Epanechnikov, Triangular, Uniform, Bright, and Cosine, to estimate the kernel density values for all the common risk factors. Table 1 shows the six common types of kernel functions.

Types Kernel Function
Uniform When using the kernel density estimation method, we must choose the kernel function and set the bandwidth. The choice of kernel function was adopted through the six common types in Table 1. To select the kernel function, all values of the risk factors were initially used to calculate the probability density value as the real density value. Then, the six common types of kernel functions in Table 1 were used to estimate the kernel density function for the risk factor as the estimated density value. Finally, a suitable kernel function was chosen using the minimum difference between the sum of the real density value and the estimated density value. Therefore, each density function of the risk factors might be fitted with different kernel functions.
The bandwidth of the kernel function is an important parameter that has a strong influence on the resulting estimate. A narrow bandwidth would allow more over-fitting of the data. Conversely, an overly wide bandwidth would not have an appropriate data fit. Figure 2 shows the differences from 100 standard normally distributed datapoints. When the bandwidth was set to 0.05 or 2, there was a large difference from the original density distribution; conversely, if the bandwidth was set to 0.337, there was an excellent fit. The mean integrated squared error (MISE) was used to choose the best bandwidth. The MISE can be calculated as follows: The MISE was separated into two parts including ( ) h f x was the estimated density value using the kernel function, and ( ) were the sizes of the data and bandwidth, respectively. If n was large and h was small, the variance of ( ) h f x can be derived as ( ) ( ) 2 1 nh K z dz  . Therefore, an approximate mean integrated squared error (AMISE) was calculated as follows: We estimated the optimal bandwidth by minimizing AMISE regarding h by the first derivative. The optimal bandwidth h was: In practice, the above formula has an infinite loop problem when calculating both selection methods and found that no particular method performed better for all problems [19]. Therefore, in this study, we used the NDR0 method, which was suggested by Liu et al. [19]. The NDR0 method has the advantage of being easily calculated using the standard deviation (σ ) and inter-quartile range of the dataset. The bandwidth can be calculated using the NDR0 as follows: The mean integrated squared error (MISE) was used to choose the best bandwidth. The MISE can be calculated as follows: The MISE was separated into two parts including was the estimated density value using the kernel function, and f (x) was the unknown true density value. Bias 2 f h (x) dx using the Taylor expansion method was derived as 1/4 h 4 µ 2 2 R( f ) , µ 2 = z 2 K(z)dz and R( f ) = f (x) 2 dx. n and h were the sizes of the data and bandwidth, respectively. If n was large and h was small, the variance off h (x) can be derived as (1/nh) K 2 (z)dz. Therefore, an approximate mean integrated squared error (AMISE) was calculated as follows: We estimated the optimal bandwidth by minimizing AMISE regarding h by the first derivative. The optimal bandwidth h was:ĥ In practice, the above formula has an infinite loop problem when calculating both K 2 (z)dz and an unknown function f (x). Liu et al. compared the accuracies of nine types of bandwidth selection methods and found that no particular method performed better for all problems [19]. Therefore, in this study, we used the NDR0 method, which was suggested by Liu et al. [19]. The NDR0 method has the advantage of being easily calculated using the standard deviation (σ) and inter-quartile range of the dataset. The bandwidth can be calculated using the NDR0 as follows: From the above procedure, we can determine the probability density function of the risk factors using the kernel density estimation approach. Next, the health risk index was proposed to estimate the health status of healthy people.

Health Risk Index Calculation
The R(t) was defined as a function of human health evaluation; it describes the t value of the risk factor once it has reached a certain value for a physiological state that was still healthy. In other words, R(t) is the health risk at value t for a particular risk factor. With an increase in the risk factor value, the disease or health risk will also increase. We assumed that n of multiple diseases were studied, and there was one normal state (healthy people) and (2 n − 1) combinations of people who suffered from different diseases. The f 1,i variable is representative of healthy individuals with the i th risk factor in the probability density function. The f 2,i , f 3,i , . . . , f 2 n ,i variables represent the probability density function of the different combinations of diseases with the i th risk factor. The probability density function used the kernel density estimation approach. In this study, the health risk index R i (t) was defined for healthy people with the i th risk factor and was calculated as follows: where t represents the value of the i th risk factor, and the interval [x i , y i ] represents the range of values in the i th risk factors for healthy people. Figure 3 presents an example of three different status functions of a risk factor to describe the health risk index. The solid line represents the function of the risk factor values of healthy subjects, and the other two lines with signs represent groups that suffered from different diseases.
From the above procedure, we can determine the probability density function of the risk factors using the kernel density estimation approach. Next, the health risk index was proposed to estimate the health status of healthy people.

Health Risk Index Calculation
The ( ) R t was defined as a function of human health evaluation; it describes the t value of the risk factor once it has reached a certain value for a physiological state that was still healthy. In other words, ( ) R t is the health risk at value t for a particular risk factor. With an increase in the risk factor value, the disease or health risk will also increase. We assumed that n of multiple diseases were studied, and there was one normal state (healthy people) and ( Where t represents the value of the i th risk factor, and the interval [ ] Each probability density value was a positive number less than 1. Therefore, ( )  Since the t value of the risk factor only fell onto the x and y-axes for healthy subjects when it was between x and y, R(t) was the ratio of the probability density of the normal group P 1 (t) to all of the probability densities of the three status functions. Therefore, R(t) can be simplified as: , for x ≤ t ≤ y.
Each probability density value was a positive number less than 1. Therefore, R(t) was between 0 and 1.
If any of the risk factors were over the threshold values, we determined that the person had the disease(s). Therefore, the health risk index R hri (t) for a healthy person can be presented as follows: We can plot the curve of the health risk index from R hri (t) for all t values of the risk factors. According to the health risk index curve, we can then evaluate the effect on health risks when risk factors change after a period of maintenance activities (such as exercise or diet control). Figure 4 shows an example of the health risk curve for SBP from 90 to 160 mmHg based on real data.
If any of the risk factors were over the threshold values, we determined that the person had the disease(s). Therefore, the health risk index for a healthy person can be presented as follows: We can plot the curve of the health risk index from ( ) hri R t for all t values of the risk factors.
According to the health risk index curve, we can then evaluate the effect on health risks when risk factors change after a period of maintenance activities (such as exercise or diet control). Figure 4 shows an example of the health risk curve for SBP from 90 to 160 mmHg based on real data. In Figure 4, when the SBP is greater than 127 mmHg, the health risk curve begins to decline, indicating a gradual decline in health. When the SBP is greater than 140 mmHg, the health risk index is zero, indicating that the individual has the disease. For example, if a healthy person had an SBP of 135 mmHg before dietary control, the health risk index

Results and Discussion
Hypertension, hyperlipidemia, and hyperglycemia are three common diseases that are associated with metabolic syndrome and related to metabolic abnormalities. If a person has these diseases, the risk of developing other chronic diseases increases as well. A dataset from the National Health Insurance Research Database in Taiwan was used to elucidate the proposed analytical procedures. The dataset included 6496 subjects (3104 males and 3392 females) aged over 15 years old and 17 physiological indicators.
Next, the analysis procedure established the appropriate kernel function for the common risk factors in every physiological state of hypertension, hyperlipidemia, and hyperglycemia. Tables 2-6 show the risk factors of the kernel density functions that were selected by the proposed estimation approaches used in this study. An example in Table 2 describes the results and indicates the outcome of the kernel density approach for FPG in the different physiological states (hypertension, hyperlipidemia, and hyperglycemia). In the normal state, the minimum value of the estimated six kernel density approaches was 0.0725. Therefore, the Gaussian type was selected as a suitable kernel function for FPG in the normal state. Similarly, the minimum value was 1.0622 for hyperlipidemia + hyperglycemia and 1.1331 for hypertension + hyperlipidemia + hyperglycemia. The Gaussian type ( ) hri R t In Figure 4, when the SBP is greater than 127 mmHg, the health risk curve begins to decline, indicating a gradual decline in health. When the SBP is greater than 140 mmHg, the health risk index is zero, indicating that the individual has the disease. For example, if a healthy person had an SBP of 135 mmHg before dietary control, the health risk index R hri (t) of SBP would be 0.4. Moreover, if their SBP decreased to 130 mmHg after a period of dietary control, the new health risk index R hri (t) of SBP would be 0.72. Therefore, the difference value between 0.4 and 0.72 (0.32) is the effect of the dietary control.

Results and Discussion
Hypertension, hyperlipidemia, and hyperglycemia are three common diseases that are associated with metabolic syndrome and related to metabolic abnormalities. If a person has these diseases, the risk of developing other chronic diseases increases as well. A dataset from the National Health Insurance Research Database in Taiwan was used to elucidate the proposed analytical procedures. The dataset included 6496 subjects (3104 males and 3392 females) aged over 15 years old and 17 physiological indicators.
Next, the analysis procedure established the appropriate kernel function for the common risk factors in every physiological state of hypertension, hyperlipidemia, and hyperglycemia. Tables 2-6 show the risk factors of the kernel density functions that were selected by the proposed estimation approaches used in this study. An example in Table 2 describes the results and indicates the outcome of the kernel density approach for FPG in the different physiological states (hypertension, hyperlipidemia, and hyperglycemia). In the normal state, the minimum value of the estimated six kernel density approaches was 0.0725. Therefore, the Gaussian type was selected as a suitable kernel function for FPG in the normal state. Similarly, the minimum value was 1.0622 for hyperlipidemia + hyperglycemia and 1.1331 for hypertension + hyperlipidemia + hyperglycemia. The Gaussian type was also the suitable kernel function for FPG in hyperlipidemia + hyperglycemia and hypertension + hyperlipidemia + hyperglycemia. The Triangular type was the appropriate kernel function for FPG for hypertension, hyperlipidemia, hyperglycemia, hypertension + hyperlipidemia, and hypertension + hyperglycemia, with the minimum values shown in Table 2.     Based on the results in Tables 2-7 summarizes all kernel density functions for the five common risk factors in every physiological state. Based on the data in Table 7 and the proposed formula, the health risk curve of the five risk factors in the normal state was calculated and plotted ( Figure 5). Figure 5a-d displays the health risk curves for FPG, T-CHO, TG, SBP, and DBP. The health risk curves of the risk factors display monotonic decreasing curves, as shown in Figure 4. When the FPG reached 126 mg/dL, T-CHO or TG reached 200 mg/dL, SBP reached 140 mmHg, or DBP reached 90 mmHg, the health risk value became 0. When any risk factor value remained over the threshold, the probability that the subject was in the normal physical state was 0. The health risk index of the physiological system could be determined by multiplying the health risk values of the risk factors. Table 8 shows the results of ten subjects who were selected from the dataset.    Ten subjects were randomly recruited to measure the risk factor data. The brackets in Table 8 indicate the health risk index values. For example, the FPG value for subject 1 was 89 (0.9527); this indicated that the health risk index of an FPG level of 89 mg/dL was 0.9527. The health risk index of the physiological system of several subjects was 0 because some of the risk factor values were over the clinical thresholds, such as in Subjects 1, 2, 4, and 8. Subject 4 suffered from hypertension and hyperglycemia. The six healthy people were Subjects 3, 5, 9, 10, 7, and 6, arranged in descending order according to their health risk indexes. Although the six subjects had normal health statuses, the reliability values of the physiological systems differed. When applying the health index, the study of Stewart et al. [11] was used as a reference. Stewart et al. [11] simulated the effect of five risk factors after implementing an exercise regimen. After a six-week exercise routine, FPG, T-CHO, TG, SBP, and DBP levels decreased an average of 0.2 mg/dL, 5.2 mg/dL, 13.4 mg/dL, 5.3 mmHg, and 3.7 mmHg, respectively. We determined the change in the risk factors by calculating the health risk index. The results showed that subjects experienced a 7.29% reduction in their health risk index for hypertension, hyperlipidemia, and hyperglycemia.
The dataset on hypertension, hyperlipidemia, and hyperglycemia (Triple H) from the National Health Insurance Research Database in Taiwan and the research results of Stewart et al. [11] were used to explain the proposed methodology and application. Previous research has focused on disease prediction and medical treatment after disease identification. If judged as healthy, there is no corresponding early warning mechanism. The difference between the proposed method and the previous method is that this study develops an evaluation health risk index from the perspective of probability reliability. The estimation of the optimal probability distribution is established using the kernel density approach rather than using a normal probability function. The research results can be applied to solve the problem of health warning. When people are considered healthy, health risk indicators can be further calculated to understand their level of health. This benefits the individuals because risk factors can be controlled based on fitness by implementing strategies to increase exercise or control diet. Furthermore, the health risk index can be used to evaluate the effectiveness of the strategy and adjust the strategy. This will concretely improve the effective use of resources in the medical/health field.

Conclusions
Based on the analysis of the results, we can make several conclusions regarding research in this area. The human body is a complex system, and a single-disease study is insufficient to assess multiple diseases. Most previous studies that conducted medical data analyses focused on risk factor selection and creating prediction and control models of diseases. These studies were able to monitor risk factors and estimate the risk of developing various diseases. However, the human body may not necessarily be completely healthy even when risk factors are all within normal ranges. Therefore, our study focused on assessing the degree of health. In this paper, common risk factors for multiple diseases were used to estimate the optimal probability density function, and a novel health risk index was proposed to evaluate the health status of healthy people.
In practice, a health risk curve can be used to elucidate the relationship between risk factors and health risk. Prevention is often more desirable than curing diseases, and proper maintenance (such as participating in sports) may reduce the likelihood of further health decline. Furthermore, the effectiveness of preventive maintenance can be assessed through a health risk curve to determine appropriate adjustments to the maintenance strategy.