Another Look at Obesity Paradox in Acute Ischemic Stroke: Association Rule Mining

Though obesity is generally associated with the development of cardiovascular disease (CVD) risk factors, previous reports have also reported that obesity has a beneficial effect on CVD outcomes. We aimed to verify the existing obesity paradox through binary logistic regression (BLR) and clarify the paradox via association rule mining (ARM). Patients with acute ischemic stroke (AIS) were assessed for their 3-month functional outcome using the modified Rankin Scale (mRS) score. Predictors for poor outcome (mRS 3–6) were analyzed through BLR, and ARM was performed to find out which combination of risk factors was concurrently associated with good outcomes using maximal support, confidence, and lift values. Among 2580 patients with AIS, being obese (OR [odds ratio], 0.78; 95% CI, 0.62–0.99) had beneficial effects on the outcome at 3 months in BLR analysis. In addition, the ARM algorithm showed obese patients with good outcomes were also associated with an age less than 55 years and mild stroke severity. While BLR analysis showed a beneficial effect of obesity on stroke outcome, in ARM analysis, obese patients had a relatively good combination of risk factor profiles compared to normal BMI patients. These results may partially explain the obesity paradox phenomenon in AIS patients.


Introduction
Stroke is a leading cause of death and disability globally [1]. Obesity is a known risk factor for cardiovascular disease (CVD) [2]. Reports also suggest that obesity may be protective for CVD severity or outcome [3][4][5]. Though most population-based cohort studies have shown that obesity increases the risk of stroke [6][7][8][9], obesity has been associated with better outcomes among patients with acute ischemic stroke (AIS) [10][11][12][13][14][15][16]. Khan et al. evaluated associations between BMI and lifetime CVD risk and mortality among a population-based cohort without established CVD at baseline [17]. Observing more than 3 million person-years, they found that being overweight was associated with earlier development of CVD, and obesity was associated with reduced longevity and cardiovascular survival. In addition, obesity increased the risk of hypertension, diabetes, and dyslipidemia. 2 of 11 There is still insufficient explanation to determine whether the relationship between obesity and CVD outcome analyzed using logistic regression is beneficial or not.
The association rule mining (ARM) algorithm was introduced for market basket analysis by Agrawal et al. and has identified significant association patterns within a variety of settings [18][19][20][21][22][23]. ARM is used for explanatory data visualization to summarize certain concurrent combinations of dependent factors (important association rules) with a specific condition in a non-hierarchical fashion [19,24]. Moreover, ARM helps to discover important relationships among complex phenomena [18,19]. Therefore, we aimed to verify the obesity paradox with logistic regression and provide a new perspective through the ARM method on the relationship between obesity and ischemic stroke outcome.

Participants
We obtained data from the prospective stroke registry of our hospital, the detailed information of which has been described previously [25]. Briefly, patients were admitted to the hospital within 7 days after the onset of focal neurologic deficits (2012-2019). Patients included in the study were identified as having relevant acute lesions on diffusion MRI. Patients with cerebral hemorrhages were excluded from the study. During the data capture period, procedures affecting the prognosis of patients, such as extending the time window of intravenous thrombolysis and endovascular treatment, were reflected in all included patients ( Figure S1). We collected demographic data and accessed data for clinical and laboratory findings for patients admitted with AIS using a standardized web server, which included central coordinator requests for regular audits and amendments of the data.

Measurement of Body Mass Index
Height and weight were recorded by stroke nurses immediately after hospitalization for stroke. BMI was calculated as weight in kilograms divided by the square of height in meters (kg/m 2 ). Patients were grouped as underweight (<18.5 kg/m 2 ), normal weight (18.5-22.9 kg/m 2 ), overweight (23-24.9 kg/m 2 ), and obese (≥25 kg/m 2 ) according to the Asian Pacific World Health Organization criteria [26].

Covariates
Vascular risk factors were defined based on our previous report [25]. ARM generally assesses possible associations between discrete variables, and the algorithm can be easily applied to categorical variables. Therefore, we used the following categorical variables for the elucidation of risk profiles and the underlying mechanisms of stroke:

•
Hypertension was defined as blood pressure ≥140/90 mmHg in more than two consecutive readings or taking antihypertensive agents; • Diabetes was defined as fasting blood glucose ≥126 mg/dL, random blood glucose readings ≥200 mg/dL with relevant diabetic symptoms or glycated hemoglobin ≥6.5% [27]

Study Outcomes
Stroke physicians or certified nurses prospectively assessed patients' modified Rankin Scale (mRS) score as a 3-month functional outcome measure when the patients visited the outpatient clinic or via a telephonic interview. The poor functional outcome was defined as an mRS score of 3-6.

Statistical Analyses
We compared the baseline differences for independent variables between good and poor outcome groups using the χ 2 test or Student's t-test, as appropriate. Univariate and multivariate binary logistic analyses were performed to assess predictors for poor outcomes at 3 months. Independent variables with a two-sided p-value < 0.05 in univariate analyses were assessed by multivariate analyses.

Association Rule Mining
The frequent pattern growth (fp-growth) is one of the ARM algorithms used to evaluate associations for each variable, and to find important linkage patterns using the quantitative parameters of support, confidence, and lift [29]. The definition and formula of these quantitative parameters are stated in Table 1. To understand these parameters, let X and Y as events in the real world, then p(X) and p(Y) are the probabilities that these events will occur. Here, the sup (X→Y) means the probability that X and Y occur at the same time. The conf (X→Y) is a conditional probability such that the likelihood of Y occurs given event X. The last parameter lift (X→Y) is defined as a weighted probability of confidence, p(Y|X)/p(Y), to verify whether event X and Y are mutually dependent (when the lift is equal to one, then X and Y are independent). Table 1. Definition of formula and explanation of support, confidence, and lift.

Formula
Definition & Meaning The value of support means how frequent this rule is appearing in the data.
The confidence indicates how much the rule is accurate.
The lift measures the dependency between the predictor and the response. The value of lift close to zero indicates independence.
Let X as a subset of predictors and Y as a response. p for probability of an association.
The fp-growth algorithm is emphasized by its efficiency of a reduced number of calculations compared to the apriori algorithm, due to its tree-like structure [29]. For illustration of this algorithm, let T denote set of transactions {t 1 , t 2 , . . . , t n } and list of all items, I = {i 1 , i 2 , . . . , i m } (Table 2). First, reorder items in each t j according to the support (i k ) in descending order for all k in [1, m]. In these ordered transactions, the less frequent items which are smaller than user-defined minimum support are excluded for fewer calculations later. As a second step to grow an fp-tree, position and connect the items in the initial transaction, t 1 , starting with a root node (null node), as in Figure 1. Each time, when items are added to the tree, increment the count of these item nodes by one. Third, for the following transaction, if the part of the sequence from the first item overlaps with one of the antecedent transactions, then the overlapping path of these items has to be merged, and the rest of the items are connected after the merged node. However, if there is no identical order of prefix, the new transaction follows the second step by connecting a sequence of items to a null node. Last, since the unit item is also a candidate for being a frequent pattern, the dashed lines in the fp-tree are drawn to denote the same unit items in the different branches of the tree. With a fully-grown tree by those steps, the candidates of frequent patterns are considered through minimum support and confidence. For example, in Figure 1, when the minimum support τ equals two, the sequence {null, i 1 , i 3 } (equality with {i 1 , i 3 }) is the only candidate satisfying the lower bound, τ, which means this sequence is a frequent pattern (rule) of the transaction data. Other candidate patterns through the fp-growth algorithm can also be considered through less conservative minimum supports.

Original Items
Reordered Frequent Items : denotes the frequency of items in transaction patterns are considered through minimum support and confidence. For example, in Figure 1, when the minimum support equals two, the sequence {null, , } (equality with { , }) is the only candidate satisfying the lower bound, , which means this sequence is a frequent pattern (rule) of the transaction data. Other candidate patterns through the fpgrowth algorithm can also be considered through less conservative minimum supports.   In the resulting rules in this paper, pruning step, a rule XY is redundant if ∃X * ⊂X and confidence(X * →Y) ≥ confidence(X→Y), is applied to drive a more generalized frequent pattern without redundant rules [30]. Since the goal of ARM is to find the rule of associations between categorical variables, the age variable was grouped as age < 55, 55 ≤, age < 65, and age ≥ 65 in our analysis. The moonBook (version 0.2.3), arules (version 1.6.4), and rCBA (version 0.4.3) packages for R software [R Foundation for Statistical Computing (version 3.6.3)] were used to perform the binary logistic regression and ARM algorithm. The exact R code for ARM using the fp-growth algorithm was shown in Supplemental Data.

Results
In total, 2580 AIS patients (mean age 68.1 ± 13.0 years, 59.1% males) were included in this study ( Table 3). The proportions of underweight, normal weight, overweight, and obese patients were 4.6%, 34.8%, 26.4%, and 34.1%, respectively. The proportions of overweight and obese patients were higher in the good outcome group compared to the poor outcome group (p for χ 2 trend <0.001). Participants with a poor outcome were more likely to be older, have severe stroke severity, to present with hypertension and diabetes compared to those with a good outcome. Current smoking was higher in patients with a good outcome than among those with a poor outcome. In the resulting rules in this paper, pruning step, a rule XY is redundant if ∃X * ⊂X and confidence(X * →Y) ≥ confidence(X→Y), is applied to drive a more generalized frequent pattern without redundant rules [30]. Since the goal of ARM is to find the rule of associations between categorical variables, the age variable was grouped as age < 55, 55 ≤, age < 65, and age ≥ 65 in our analysis. The moonBook (version 0.2.3), arules (version 1.6.4), and rCBA (version 0.4.3) packages for R software [R Foundation for Statistical Computing (version 3.6.3)] were used to perform the binary logistic regression and ARM algorithm. The exact R code for ARM using the fp-growth algorithm was shown in Supplemental Data.

Results
In total, 2580 AIS patients (mean age 68.1 ± 13.0 years, 59.1% males) were included in this study ( Table 3). The proportions of underweight, normal weight, overweight, and obese patients were 4.6%, 34.8%, 26.4%, and 34.1%, respectively. The proportions of overweight and obese patients were higher in the good outcome group compared to the poor outcome group (p for χ 2 trend <0.001). Participants with a poor outcome were more likely to be older, have severe stroke severity, to present with hypertension and diabetes compared to those with a good outcome. Current smoking was higher in patients with a good outcome than among those with a poor outcome.  Table 4 shows differences in clinical characteristics for patients according to BMI. We found that age was inversely associated with BMI and that male sex was more prevalent in the overweight group than in the other BMI groups. Hyperlipidemia and current smoking were less frequent, and cardioembolism was more frequent in the underweight group than in the normal weight, overweight, and obese groups.  Table 5 shows the results of BLR analyses for predictors of a poor outcome at 3 months. Being underweight was a significant predictor for poor outcomes in multivariate analysis. However, obesity had a beneficial effect on the outcome at 3 months after adjusting for multiple covariates. Moreover, addition logistic regression analysis considering age*BMI interaction and entering age and BMI as a continuous variable were not different from the original BLR analysis (Tables S1 and S2). For fp-growth algorithms, we set minimum support bound of 0.04 and a confidence bound of 0.8, according to the average of all possible association rules that could be generated in our dataset. With this parameter setting, 329 rules were generated in total. After a pruning procedure for reducing redundant association rules, five association rules satisfied the support and confidence limits ( Table 6). In the top five rules with high confidence & lift values, obesity, age less than 55 years, and mild stroke severity were presented to have a relationship with a good outcome. Figure 2A depicts the relationship of these association rules according to support, confidence, and lift values. Obese was associated with a good outcome, in which no diabetes, mild stroke severity, no hyperlipidemia, male, smoking, and younger age (<55 years) were concurrently observed. Figure 2B shows the coordination plot of significant association rules in our fp-growth algorithm. The rule of being obese, younger, and presenting with no diabetes, no hyperlipidemia, and mild stroke severity was associated with a good outcome with the highest lift value in our rules. Table S3 shows the result of the five most frequent association rules in each BMI subgroup. Underweight patients with severe stroke severity and older age were especially associated with poor outcomes.

Discussion
In the present study, logistic regression analyses in the established ischemic stroke population revealed that being obese was associated with a good outcome. However, obese patients had a lower age, male gender, and were less likely to present with cardioembolism. In other words, the obesity paradox could be partially explained by the fact that obese patients may be younger than patients with normal BMI or may be more likely to have a stroke that is associated with a non-cardioembolism mechanism.
Welton et al. conducted a prospective investigation of 8528 patients with diabetes and reported that mortality among obese patients was less than among patients with a normal BMI [31]. However, this tendency decreased when the fitness level was high in obese patients, and the authors suggested that the fitness level measured by the metabolic equivalent of the task may be a moderator of the association between obesity and mortality. Meanwhile, age was different for each BMI group in the present study, and notably, the mean age was lowest in the obese group (BMI > 30 kg/m 2 ). Bhaskaran et al. analyzed the data of 3.6 million participants from the UK general population for associations between BMI and mortality using prospective survival analysis, and they suggested that mortality was higher in subjects with low (<18.5 kg/m 2 ) or high BMI (≥25 kg/m 2 ) [32]. The average age was 37 years in their study, and the overweight or obese group was found to be 10 years older than the normal BMI group. Thus, age can be an important factor for

Discussion
In the present study, logistic regression analyses in the established ischemic stroke population revealed that being obese was associated with a good outcome. However, obese patients had a lower age, male gender, and were less likely to present with cardioembolism. In other words, the obesity paradox could be partially explained by the fact that obese patients may be younger than patients with normal BMI or may be more likely to have a stroke that is associated with a non-cardioembolism mechanism.
Welton et al. conducted a prospective investigation of 8528 patients with diabetes and reported that mortality among obese patients was less than among patients with a normal BMI [31]. However, this tendency decreased when the fitness level was high in obese patients, and the authors suggested that the fitness level measured by the metabolic equivalent of the task may be a moderator of the association between obesity and mortality. Meanwhile, age was different for each BMI group in the present study, and notably, the mean age was lowest in the obese group (BMI > 30 kg/m 2 ). Bhaskaran et al. analyzed the data of 3.6 million participants from the UK general population for associations between BMI and mortality using prospective survival analysis, and they suggested that mortality was higher in subjects with low (<18.5 kg/m 2 ) or high BMI (≥25 kg/m 2 ) [32]. The average age was 37 years in their study, and the overweight or obese group was found to be 10 years older than the normal BMI group. Thus, age can be an important factor for analyzing obesity and stroke outcomes. Khan et al. analyzed the relationship between obesity and CVD using individual patient data from ten population-based cohorts [17]. The increase in BMI at cohort enrollment accelerated CVD onset, which eventually increased the longevity of patients with CVD. In addition, high BMI increased CVD mortality compared to non-CVD death. This study followed 19,000 patients for over 50 years overall and was stratified according to age, sex, and BMI status. This approach may have minimized the selection bias, such as lead-time bias or survivorship bias.
After a report regarding the obesity paradox in stroke patients, there has not been a clear explanation for the relationship between positive stroke outcomes and high BMI status [10]. Results of prospective cohort studies within the general population consistently show that high BMI increases stroke risk [6][7][8][9]. However, the contradicting results found within the obesity paradox for stroke have been reported in studies with established stroke cohorts [10][11][12][13][14][15][16][33][34][35][36][37]. Several studies have also reported that the strength of correlation between high BMI and low stroke mortality was attenuated by age [10,13]. An additional consideration is that those with a high BMI group in the established stroke population are consistently reported to be about 3-7 years younger than those with normal BMI in these observational stroke cohorts. According to the Danish Stroke Register report, higher BMI is associated with accelerated ischemic stroke occurrence [38]. Andersen et al. reported that obesity was associated with lower mortality and risk of readmission for recurrent stroke than the normal weight group after applying Cox proportional hazard models [14]. However, concerns regarding the proportional hazard assumption and age discrepancies between each BMI group have not been discussed in detail. Results from the Atherosclerosis Risk in Communities study, which adjusted for time-varying covariates using the G-estimation method, reported that the association between stroke outcome and BMI status could vary due to mishandling time-varying confounders [9]. Hence, age is the most important variable for explaining the relationship between BMI and stroke outcome. In addition, from the results of the logistic regression model, it can be suggested that high BMI is related to a good outcome, but residual confounding factors related to age should be considered when interpreting these associations.
The ARM algorithm was initially designed to determine the specific purchasing patterns of people in the market and to analyze customers' behavior [18]. Since then, studies using ARM to find valuable linkages among various factors have been reported in the medical field. Szalkai et al. analyzed factors affecting cognitive decline below 15 points in the Mini-Mental Status Examination score within more than 5000 Alzheimer disease databases [39]. Factors such as high aspartate aminotransferase and high serum sodium were identified as significant rules showing a strong association with cognitive decline. In addition, ARM can specify which non-hierarchical phenotype has been achieved by effective clustering of multiple diseases [40][41][42]. However, ARM usually deals with Boolean data, so it is difficult to analyze associations among the numerical data. Further, the antecedents and consequents in an ARM analysis are not a way to depict a potential causal relationship because variables with high proportions are usually applied as antecedents.
Logistic regression analysis represents the individual risk as an odds ratio; hence, it focuses on individual risk [43]. However, ARM evaluates the entire dataset on a microscopic level and can find important patterns in the groups of interest, though the effect sizes are small [44]. To be specific, when logistic regression and ARM analyze the data, these methods can reach a similar result. However, when it comes to the interpretation of the associations between the predictors and response, logistic regression predicts a probability of dependent variable using all the independent variables and significance of a single predictor. ARM methodology finds association rules between the subset of predictors and response with its frequency and significance. As in the Supplementary Table S2, for clarification of the result, an interaction term among the age and BMI is considered in the logistic regression to account for an additional effect between predictors on a dependent variable, but it presented no significant relationship with the response. In addition, ARM does not include prior hypotheses for statistical testing and needs to consider interactions among the variables of interest [45]. In other words, ARM is complementary to the conventional logistic regression model. Therefore, if we implement ARM in addition to logistic regression analysis in evaluating the relationship between disease and risk factors, we can make a more explanatory hypothesis by checking the macroscopic and microscopic associations.
The present study used the data of a representative stroke population. However, our study has a few limitations. First, the results did not include numerical laboratory data such as low-density lipoprotein or glucose levels because of the innate disadvantages of the ARM algorithm. Second, a single-center, retrospective observation is prone to have a selection bias, so the results may not be generalizable to the entire stroke population. Third, we used BMI values at the time of admission for evaluating the degree of obesity; however, BMI is well-known to be a time-varying covariate. Therefore, our study does not provide results on how changes in BMI after stroke affect the outcome at 3 months. Finally, we used Asian Pacific World Health Organization criteria for the obesity categorization. Because BMI depends on age, gender, and ethnicity, it is difficult to generalize our findings to all ischemic stroke patients. Therefore, for an international comparison of the impact of BMI on the prognosis of stroke, we should consider which criteria were used for the BMI categorization. To accurately understand the impact of BMI on CVD in AIS, we need later prospective studies considering age stratification and temporal variation of BMI in the research design and analytic phase.

Conclusions
Our findings using logistic regression analysis suggest that obesity is associated with a good outcome after stroke. However, the results of ARM analysis revealed that being obese was associated with good outcomes by way of younger age at the onset and mild stroke severity. This suggests that the good outcome in AIS patients was not because the patients were obese, but rather because those patients were younger and their stroke severity was less severe than those with a normal BMI. Thus, the ARM algorithm can be used to find novel and valuable linkages among risk factors and outcomes in the medical research field.