Stroke to Dementia Associated with Environmental Risks—A Semi-Markov Model

Background: Most stroke cases lead to serious mental and physical disabilities, such as dementia and sensory impairment. Chronic diseases are contributory risk factors for stroke. However, few studies considered the transition behaviors of stroke to dementia associated with chronic diseases and environmental risks. Objective: This study aims to develop a prognosis model to address the issue of stroke transitioning to dementia associated with environmental risks. Design: This cohort study used the data from the National Health Insurance Research Database in Taiwan. Setting: Healthcare data were obtained from more than 25 million enrollees and covered over 99% of Taiwan’s entire population. Participants: In this study, 10,627 stroke patients diagnosed from 2000 to 2010 in Taiwan were surveyed. Methods: A Cox regression model and corresponding semi-Markov process were constructed to evaluate the influence of risk factors on stroke, corresponding dementia, and their transition behaviors. Main Outcome Measure: Relative risk and sojourn time were the main outcome measure. Results: Multivariate analysis showed that certain environmental risks, medication, and rehabilitation factors highly influenced the transition of stroke from a chronic disease to dementia. This study also highlighted the high-risk populations of stroke patients against the environmental risk factors; the males below 65 years old were the most sensitive population. Conclusion: Experiments showed that the proposed semi-Markovian model outperformed other benchmark diagnosis algorithms (i.e., linear regression, decision tree, random forest, and support vector machine), with a high R2 of 90%. The proposed model also facilitated an accurate prognosis on the transition time of stroke from chronic diseases to dementias against environmental risks and rehabilitation factors.


Introduction
Stroke results from the sudden occlusion or rupture of a blood vessel that supplies blood to the brain. This condition is categorized as either ischemic or hemorrhagic, where 87% of such cases are ischemic [1]. Approximately 15 million people suffer from stroke each year, and the number of stroke deaths increases annually [2]. In Taiwan, stroke is the leading cause of death, which places a substantial burden on the national healthcare system [3]. Stroke is also one of the largest causes of serious long-term mental and physical disabilities [1,4,5] and a relevant factor in dementia development [6]. Previous studies proposed that poststroke dementia is a designation for dementia following stroke over time [7][8][9][10]. Most of the stroke patients in Taiwan were diagnosed with hypertension, diabetes mellitus, and hyperlipidemia. This study used these three chronic diseases for the basic classification of stroke patients. Eleven types of chronic diseases were divided into 5 categories: newly diagnosed with hypertension, newly diagnosed with diabetes mellitus, newly diagnosed with hyperlipidemia, newly diagnosed with other chronic disease (including chronic lung disease, hyperthyroidism, chronic kidney disease, heart failure, atrial fibrillation, sleep apnea, gout, and peripheral artery disease), and stroke patients without any of the 11 chronic diseases before stroke. Furthermore, stroke patients usually suffer from multiple chronic diseases after being diagnosed with a chronic disease. This phenomenon is called the metabolic risk factor. Therefore, this study used the main types of chronic diseases including metabolic risk factor for re-classification. We used chi-square significance test for the selected chronic diseases and related metabolic risk factors to build an SMP model.  The criteria for enrollment in NHIRD included two types of stroke identified from January 1, 2000 to December 31, 2010. In International Classification of Diseases (ICD) ICD-9-CM coding system, stroke patients were defined by ICD-9-CM code of 430,432 for hemorrhagic stroke and ICD-9-CM code of 433,437 for ischemic stroke. The ICD-9-CM code for relative chronic diseases and dementia is listed in Table 1. For the modeling, the date of stroke diagnosis must be beyond the date of chronic disease diagnosis, and the date of dementia diagnosis must be beyond the date of stroke diagnosis. Most of the stroke patients in Taiwan were diagnosed with hypertension, diabetes mellitus, and hyperlipidemia. This study used these three chronic diseases for the basic classification of stroke patients. Eleven types of chronic diseases were divided into 5 categories: newly diagnosed with hypertension, newly diagnosed with diabetes mellitus, newly diagnosed with hyperlipidemia, newly diagnosed with other chronic disease (including chronic lung disease, hyperthyroidism, chronic kidney disease, heart failure, atrial fibrillation, sleep apnea, gout, and peripheral artery disease), and stroke patients without any of the 11 chronic diseases before stroke. Furthermore, stroke patients usually suffer from multiple chronic diseases after being diagnosed with a chronic disease. This phenomenon is called the metabolic risk factor. Therefore, this study used the main types of chronic diseases including metabolic risk factor for re-classification. We used chi-square significance test for the selected chronic diseases and related metabolic risk factors to build an SMP model.

Markovian-Based Modeling
Various approaches have been applied to explore the influence of key factors on stroke patients for occurrences over time; however, only a few achieved high prediction [33]. In addition to standard statistical analyses, such as correlation analysis, covariate analysis, and subgroup analysis, the transition probability in Markovian process is preferable to analyze the risk factor effect on stroke [33]. Among the variants, the SMP provides a flexible approach to characterize the distribution of the sojourn times between states.
Multi-state models are often used to model the development of a disease in medical research, wherein the different levels of the disease can be seen as the states of the model [34]. The parametric SMP models allow the incorporation of covariates in the distribution of sojourn times to investigate the effect on the transition risk by using a proportional-hazard regression model [35,36].
The processes and the calculation of the transition probabilities of a SMP can be described as follows. First consider a model with k states belonging to finite state space E = {1, 2, . . . , k}, and consider X 0 , X 1 , X 2 , . . . , X n ∈ E be the successive states in n visits by a random process, in which 0 = T 0 < T 1 < . . . < T n are the consecutive time of entrance into each of these states. And the probability of the n transitions jumping from state i to j, embedded into model can be written as: As the Markov process does not deal with the sojourn time of the state transitions, the random process regards the transition sojourn time (T n+1 − T n ) in an SMP and its distribution that satisfies: The probability density function of sojourn time in state i before passing to state j is given by: The cumulative probability function, F ij (t) and the corresponding survival function of waiting time in state i, S i (t) are defined by: The hazard function of an SMP model, which represents the probability of transition towards state j between time t and t + ∆t, given that the process is in state i for a duration t can be drawn as follows: Let m ij be the covariate specific states transition i to j and the vector is represented by z ij = (z 1 ij , z 2 ij , . . . , z m ij ij ), and the baseline hazard function of the transition i to j is h 0,ij (x), the parameter vector in a Cox model is β ij . The hazard function with covariates is defined by: Incorporating the Cox model can not only deal with multivariate models, but investigate the effect upon the relative risk (RR) of transferred stroke. Further, under the transition-specific strategy for covariates, applying Weibull distributions as the sojourn time distribution is more flexible to fit the hazard function, also generalizes the exponential distribution by two parameters W(σ ij , v ij ), making the model well suit for various shapes, especially for medical research [35]. The Weibull function of the sojourn time distribution is defined by: where σ ij , v ij are scale and shape parameters. Covariate analysis and Markovian-based prediction modeling were used in this study. The covariates included in the SMP model must be of categorical parameters as limitation ( Table 2). The environmental variables are defined in Table 3. In this study, a value beyond one standard deviation of the mean was considered as high incidence group.  The divorce and unemployment rates used in this study were from Socio-Economic Geographic Information System (SEGIS) [37]. The temperature data were from Taiwan Typhoon and Flood Research Institute (TTFRI) [38]. The air pollution index was from the environmental resource database of Environmental Protection Administration (EPA) [39]. The rate of elderly people living alone was from Ministry of Health and Welfare [40]. Each patient was characterized by one set of covariates to assess the influence of the environmental risk factors on the stroke patients. The connection between the individual patient and the environmental data was examined using the registered residence of the stroke patients recorded in the NHIRD. The environmental risk factors were collected from 22 administrative districts of Taiwan.
The stroke patients were identified into nine states in the SMP model, including five states of chronic disease, two types of stroke, and two types of dementia, to explore the phenomenon among state transitions in SMP ( Figure 2). This study combined hypertension and hyperlipidemia into one state to investigate the relationship between chronic disease and stroke. The state is defined in Table 4. Chronic disease states included hypertension and with one metabolic risk factor after diagnosed with hypertension, diabetes mellitus, hyperlipidemia and with one metabolic risk factor diagnosed with hyperlipidemia, with more than four metabolic risk factors after having another chronic disease, and without any of the 11 types of chronic disease. These chronic states represent the initial state in the SMP model. Hemorrhagic and ischemic stroke are the stroke states, whereas vascular dementia and non-vascular dementia are the dementia states.
The state diagram of the SMP model is depicted in Figure 2. The arrows indicate transitions among states, and the probability transition matrix was computed from our stroke patient population. This process used Weibull distribution to construct the time delay between different transitions. This study focused on the transferred stroke cases, such as chronic disease transfer to stroke state and stroke transfer to dementia state or remain in the stroke state. Such multi-state SMP model can be resolved via a software package called "semi-Markov" in R [33]. chronic disease. These chronic states represent the initial state in the SMP model. Hemorrhagic and ischemic stroke are the stroke states, whereas vascular dementia and non-vascular dementia are the dementia states.
The state diagram of the SMP model is depicted in Figure 2. The arrows indicate transitions among states, and the probability transition matrix was computed from our stroke patient population. This process used Weibull distribution to construct the time delay between different transitions. This study focused on the transferred stroke cases, such as chronic disease transfer to stroke state and stroke transfer to dementia state or remain in the stroke state. Such multi-state SMP model can be resolved via a software package called "semi-Markov" in R [33].  Non-vascular dementia

Multivariate Analysis
In the covariate model, we obtained 98 transitions, consisting of 10 transitions in the first layer (5 types of chronic diseases multiplied by 2 types of stroke) and 4 transitions in the second layer (2 types of stroke multiplied by 2 types of dementia) for the 7 covariates (i.e., gender, age, and the five types of environmental risk factors). A Wald test (H0: βij = 0; H1: βij 0) with p-value < 0.05 was given.
In the multivariate model, all covariates selected from the statistically significant transitions from univariate analysis were included. The results of multivariate analysis are presented in Table 5. The three sets of models were as follows: A. environmental risk factors, B. medication and rehabilitation factors, and C. all factors. The negative estimation β for the Cox regression model indicated that the transition risk was higher for the base group than for the other groups. In the transition of chronic diseases to stroke, in model A, age, unemployment rate, and temperature showed high effect occurrences. The risk to transfer from hypertension (RR = 1.24) and hyperlipidemia (RR = 1.64) to hemorrhagic stroke was higher in males than in females as the base group. For the age factor, the patient under 65 years old has higher risk. The divorce rate indicated that high divorce rate was 3.91 times risk in diabetes mellitus transfer to hemorrhagic stroke group than in the base group. Conversely, the unemployment rate factor indicated that the base group had higher risk compared with the high unemployment rate. The temperature factor revealed that low temperature had 1.27-1.72 times risk compared with normal temperature. The air pollution revealed that high air pollution had 1.07 and 1.31 times risk to be transferred from hypertension and with other eight chronic diseases to hemorrhagic stroke.
In the transition of stroke to dementia, only the normal temperature (the base group) had high risk, such as in ischemic stroke transfer to vascular dementia (RR = 0.73) and to non-vascular dementia (RR = 0.64).
To validate the effect of environmental risk factors, this study compared with the model on medication and rehabilitation of stroke patients, as shown in model B. Moreover, this study built an overall multivariate model in which covariates were selected from models A and B. Model C revealed that the environmental risk factors retained the risk effect on transitions. However, two transitions were removed from model C.

Findings in SMP
The predictive model and corresponding result, which includes mean time, standard deviation, and adjusted R 2 , are shown in Table 6. The prediction result on overall sojourn time indicated that the transition of diabetes mellitus to ischemic stroke had a sojourn time of 1.96 years. For the transition of stroke to dementia, hemorrhagic stroke to non-vascular dementia had a sojourn time of 1.29 years. The R 2 values in the transition were all greater than 0.8, indicating that the overall prediction model had 80% prediction performance.
The results of sojourn time prediction by covariates indicated that different models showed their prediction merit with respect to different chronic diseases. In general, the adjusted R 2 of model A outperformed the others in most transition cases from chronic to both hemorrhagic and ischemic strokes. This result indicated that the environmental risk has high explanatory power in the prediction of transition time. For the transition of stroke to dementia, model C had higher prediction performance than the other models did, except in hemorrhagic stroke to vascular dementia that was preferred by model A. As shown in Figure 3, the transition time prediction was remarkably improved in the proposed model compared with only considering medication and rehabilitation.
where p is the number of regressors, and n is the sample size.
The results of sojourn time prediction by covariates indicated that different models showed their prediction merit with respect to different chronic diseases. In general, the adjusted R 2 of model A outperformed the others in most transition cases from chronic to both hemorrhagic and ischemic strokes. This result indicated that the environmental risk has high explanatory power in the prediction of transition time. For the transition of stroke to dementia, model C had higher prediction performance than the other models did, except in hemorrhagic stroke to vascular dementia that was preferred by model A. As shown in Figure 3, the transition time prediction was remarkably improved in the proposed model compared with only considering medication and rehabilitation. Table 7 reveals that the SMP prediction model has high R 2 , which indicated that the proposed transition time prediction model has a good explanatory power compared with linear regression, decision tree, random forest, and support vector machine (SVM) [29].   Table 7 reveals that the SMP prediction model has high R 2 , which indicated that the proposed transition time prediction model has a good explanatory power compared with linear regression, decision tree, random forest, and support vector machine (SVM) [29]. Model C considered age, gender, environmental risk factors, and medication and rehabilitation of the stroke patient as covariates. This model indicated that the drugs for lowering blood pressure, blood lipid, and blood sugar decreased the risk of most transitions of chronic disease to stroke and the transition of hypertension to ischemic stroke. In addition, drug interaction data revealed that patients medicated with these three drugs in the same period have low transition risk. Only the blood pressure lowering drug influenced the transition of stroke to dementia (RR = 0.44). For rehabilitation, the results suggested that for people who suffer from ischemic stroke, being active on rehabilitation could reduce their risk of transition to non-vascular dementia (RR = 0.83).
Multivariate analysis results indicated that divorce rate (RR = 3.91), temperature (RR = 1.27, 1.5, 1.72) and air pollution (RR = 1.07, 1.31) have positive signs on chronic disease transfer to stroke. By contrast, the risk of transfer decreases when the unemployment rate shows the negative sign, which indicated high unemployment rate. In the transition of stroke to dementia, the temperature indicated that the risk of transfer to dementia decreased with the temperature (RR = 0.73, 0.64).
After all the environmental factors in multivariate analysis were considered, two transitions were removed from the medication and rehabilitation. These two transitions do not substantially influence the stroke transfer to dementia due to the explanation power of environmental factors.

Conclusions
This study facilitates accurate prognosis on the transition behavior and time of stroke from chronic diseases to dementias against environmental risks, medication, and rehabilitation factors using a multilayer and multistate SMP model. Results indicated that the covariates incorporated in the study had different effects on sojourn time in stroke transition. Multivariate analysis results revealed that the high divorce rate, high air pollution, and low temperature were high-risk factors that triggered the transition of a chronic disease to stroke. Previous researchers associated cold and hot temperatures with an increased risk of stroke mortality [19,41]. Their study investigated the mortality owing to stroke, whereas our present study confirmed the effect of temperature on the transition of a chronic disease to stroke. Hyperthermia in acute stroke victims caused a poor prognostic parameter on a short-term basis. Roy and Ray [42] focused on body temperature, whereas our present study investigated environmental temperature. Lim et al. [43] addressed that diurnal temperature change over the preceding 24 h is associated with daily stroke incidence. Their study analyzed the short-term effect of environmental temperature, whereas our present study addressed a long-term effect of environmental temperature annually. Age and temperature are high-risk factors for the transition of stroke to dementia. Compared with a covariate model of medication and rehabilitation from a previous study, the proposed environmental factor-based prognosis model had a good explanation power of the risk effects on state transitions. The present study highlighted the high-sensitive population through subgroup analysis. Our result pointed out that the population of males below 65 years old was the most sensitive one against the environmental risk factors. Among all factors, air pollution had the highest influence in the transition of a chronic disease to stroke, and the unemployment rate had the highest influence in the transition of stroke to dementia. The constructed SMP model predicts the transition time the stroke patients stay in different states (i.e., from chronic to stroke, and from stroke to dementia). By comparing the adjusted R 2 with other benchmark algorithms (linear regression, decision tree, random forest, and SVM), we found that the proposed prediction model has the highest prediction performance. In addition, we also confirmed that the Weibull distribution has a good flexible fitting ability to deal with medical survival situations.
Future research may incorporate additional environmental risk factors, such as marital status, socioeconomic status, dietary habits, employment types, and water quality, which may lead to a high prediction on the transition behaviors of stroke. Given the lack of information integration between environmental and medical data, this study connected these data only through the patient's registered residence. However, integrating these heterogeneous data precisely and individually is still a challenge worthy to be further investigated. Environmental risk factors served as covariates in the proposed model in a binary form due to computational constraints. Future studies may investigate the influence of the numerical types of covariates. The current SMP model may also be extended by adding death state to investigate the survival time of stroke patients.