Dynamic Changes and Temporal Association with Ambient Temperatures: Nonlinear Analyses of Stroke Events from a National Health Insurance Database

Background: The associations between ambient temperatures and stroke are still uncertain, although they have been widely studied. Furthermore, the impact of latitudes or climate zones on these associations is still controversial. The Tropic of Cancer passes through the middle of Taiwan and divides it into subtropical and tropical areas. Therefore, the Taiwan National Health Insurance Database can be used to study the influence of latitudes on the association between ambient temperature and stroke events. Methods: In this study, we retrieved daily stroke events from 2010 to 2015 in the New Taipei and Taipei Cities (the subtropical areas) and Kaohsiung City (the tropical area) from the National Health Insurance Research Database. Overall, 70,338 and 125,163 stroke events, including ischemic stroke and intracerebral hemorrhage, in Kaohsiung City and the Taipei Area were retrieved from the database, respectively. We also collected daily mean temperatures from the Taipei and Kaohsiung weather stations during the same period. The data were decomposed by ensemble empirical mode decomposition (EEMD) into several intrinsic mode functions (IMFs). There were consistent 6-period IMFs with intervals around 360 days in most decomposed data. Spearman’s rank correlation test showed moderate-to-strong correlations between the relevant IMFs of daily temperatures and events of stroke in both areas, which were higher in the northern area compared with those in the southern area. Conclusions: EEMD is a useful tool to demonstrate the regularity of stroke events and their associations with dynamic changes of the ambient temperature. Our results clearly demonstrate the temporal association between the ambient temperature and daily events of ischemic stroke and intracranial hemorrhage. It will contribute to planning a healthcare system for stroke seasonally. Further well-designed prospective studies are needed to elucidate the meaning of these associations.


Introduction
Cerebrovascular disease, including ischemic stroke (IS), intracerebral hemorrhage (ICH), subarachnoid hemorrhage, and miscellaneous entities is one of the most common causes of death around the world [1][2][3]. In addition to its high mortality rate, severe morbidity causes suffering for patients and places burdens on families and society [4]. The risk factors of stroke including atrial fibrillation, dyslipidemia, hypertension, and diabetes mellitus have been well-studied [5][6][7]. The association between climate and stroke incidence has been observed clinically for a long time. Previous studies have suggested that changes in ambient temperature or other meteorological parameters including air pressure, humidity, sunshine duration, and rainfall might trigger a stroke [8][9][10]. There are several hypotheses regarding the possible mechanisms of stroke related to weather changes, including increased sympathetic tone resulting in elevated blood pressure in cold weather or vigorous temperature changes [11] and dehydration by excessive perspiration causing hyperviscosity in hot weather [12,13]. However, the association between ambient temperature and stroke incidence remains inconsistent. One study reported the highest incidence in the summer [14]. In contrast, other studies reported peak incidences in the winter or spring [10,15]. Differences in studies may be due to different risk factors and health care systems in the study areas [16]. National health insurance covers most residents in Taiwan and offers a sufficient and equal health care system countrywide compared with some other countries. The claims database of the national health insurance, National Health Insurance Research Database (NHIRD), has been used for studies of risk factors, incidences, and changes in therapeutic strategies for disease by big data analysis [17]. The Tropic of Cancer passes through the center of Taiwan and divides the island into subtropical (northern) and tropical (southern) areas ( Figure 1). Thus, it is a suitable model to investigate the association between daily temperature and stroke incidence and the impact of the climate or latitude on a potential association by analyzing the NHIRD in two areas with different climate zones [18,19]. Stroke is a vascular incident that mainly results from progressive atherosclerosis or vascular anomalies. Therefore, more than one stroke can occur over a lifetime. However, most studies have focused on the incidence of stroke or first-ever stroke. A prospective study showed that stroke recurrent rates were 5.7% and 22.5% within one and five years, Stroke is a vascular incident that mainly results from progressive atherosclerosis or vascular anomalies. Therefore, more than one stroke can occur over a lifetime. However, most studies have focused on the incidence of stroke or first-ever stroke. A prospective study showed that stroke recurrent rates were 5.7% and 22.5% within one and five years, respectively. The recurrent rates of IS were higher than those for ICH in China [20]. A hospital-based study of NHIRD found that the one-year recurrence of ischemic stroke was 9.6% and 7.8% in 2000 and 2001, respectively [21]. Therefore, when investigating these dynamic changes and temporal association with ambient temperatures, stroke events are more relevant than the incidence or case number of the first-ever stroke. The use of the first five columns of discharge diagnosis of claimed data for hospitalized patients is a valid reliable method to estimate stroke incidence in NHIRD [22]. To decrease the possibility of overestimation, we added brain image studies to the inclusion criteria to confirm the events of stroke in this study.
A previous study reported that more than 60% of Taiwanese stroke patients arrived at hospital within 24 h in 1997 [23]. Another hospital-based study conducted in Kaohsiung from June 2004 to October 2005 showed that 93% of 129 patients with acute ischemic stroke presented to emergency services within 3 h [24]. The rate has increased following promotions by the government, as well as patient and academic societies. The awareness of early stroke features, as well as convenient, fast transportation and efficient healthcare and claim systems make it feasible to obtain reliable daily events of stroke from the NHIRD.
Empirical mode decomposition (EMD) is a proposed algorithm for decomposing or analyzing nonlinear and nonstationary signals. The EMD decomposes non-stationary signals into a number of intrinsic mode functions (IMFs), each of which is a mono-component function by repetitively averaging an envelope of regional maxima and minima. Briefly, local maxima and minima are identified to generate upper and lower envelopes using a cubic spline method. Then, a mean envelope is developed by calculating the mean of the upper and lower envelopes as the first IMF (IMF1). The IMF1 is subtracted from the original signal to obtain a new series. The new signal series is processed with the same procedures and then comes to be another mean envelope as the second IMF (IMF2). The procedures are repeated until the residual becomes monotonic. Therefore, oscillating intermixed signals can be decomposed into several IMFs. Meanwhile, the domain frequency of IMF decreases during the procedures until a residual trend arises [25]. However, certain problems such as mode mixing impede the use of this algorithm. Ensemble empirical mode decomposition (EEMD) has been developed to overcome these difficult issues. After adding appropriate white noise, the original nonlinear data can be decomposed into different intrinsic mode functions (IMFs) by EMD [26]. We have used EMD and EEMD to decompose and analyze nonlinear physiological signals, including digital volume pulse, ECG, R-R intervals, and surface electromyography, to extract stable and purified unique physical signals for analysis [26,27]. In this study, we used EEMD to decompose nonstationary daily stroke events and ambient temperatures in the Taipei Area located in a subtropical zone and Kaohsiung City in a tropical zone to verify the association between stroke occurrence and temperature and the impact of climate zones on these associations.

Data Source and Definition of Stroke Onset
We retrieved data of stroke patients with IS or ICH aged 20 years or older who were diagnosed with stroke from the NHRID using the following criteria: 1. admitted via emergency services with a diagnostic code of stroke in any column of their discharge diagnoses (up to five); 2. having received head CT or brain MRI by emergency services or during hospitalization from January 2010 to December 2015. Stroke diagnoses were coded using the International Classification of Diseases, ninth revision with clinical modification (ICD-9-CM code) for 2010-2014 and ICD-10 in 2015, following the regulations of the insurance claims of the national health insurance. The diagnoses of IS were coded as 433.xx, 434.xx, or 436.xx, excluding 433.x0 and 434.x0. These were coded as I63, I67.8, I67.9, G46, or H34 in 2015. The diagnoses of ICH were coded as 430, 431, 432, or 852 with ICD-9-CM and I60 or I61 with ICD-10 [22,28]. The onset of stroke was the date of visiting the emergency service. The retrieved stoke patients were divided into a middle-aged (20 years to 59 years) group and an elderly group (aged equal to or older than 60 years). We obtained the daily mean temperature data for the Taipei and Kaohsiung weather stations from the Central Weather Bureau, Taiwan for January 2010-December 2015. The Institutional Review Board of Hualien Tzu Chi General Hospital approved this research protocol (IRB 107-188-C).

Meteorological and Geographic Background of the Study Design
Taipei, New Taipei, and Kaohsiung are big cities in Taiwan. The first two cities are located in the subtropical region, and the latter is in the tropical region. Taipei and New Taipei are close together, and more than six million people live and work there. It is not easy to identify patients' residence using a hospital-based claim data in a disease incidence study. Therefore, we used the Taipei Area, including Taipei City and New Taipei City, to estimate the occurrence of stroke in the subtropical area in this study. Kaohsiung City is the biggest city in this region, with a population of 2.7 million. We used this city to study stroke onset in a tropical area ( Figure 1). Although the distance from Kaohsiung City to the Taipei Area is only 350 km, the climates are quite different between these areas. Kaohsiung City is located in an Am (tropical, monsoon) zone, and the Taipei Area is a Cfa (temperate, no dry season, hot summer) zone, according to Köppen-Geiger climate classification [29].

Ensemble Empirical Mode Decomposition of Daily Stroke Incidence and Ambient Temperature
We used an EEMD algorithm to decompose the daily mean temperatures and following stroke events, as previously described. Appropriate white noise was added to the original data before decomposition. Then, the new series was decomposed into several IMFs with EMD. The decompositions were repeated with a different white noise at each timepoint. Finally, the ensemble means of the corresponding IMFs of the decomposed data were defined as the final result [30].
Empirical mode decomposition (EMD) is a versatile adaptive time-frequency data analysis method for extracting signals from data generated in noisy nonlinear and nonstationary processes in a variety of situations. However, the frequent appearance of mode mixing, which is the consequence of signal intermittency (defined as a single intrinsic mode function (IMF), either consisting of signals of widely disparate scales or a signal of a similar scale residing in different IMF components), has been reported to be one of the major drawbacks of the original EMD. The signal intermittency can not only cause ambiguity in the time-frequency distribution, but also obscure the physical meaning of individual IMFs.
To encounter this problem, ensemble empirical mode decomposition (EEMD) [31] was adopted in the current study. The EEMD method consists of an ensemble of data decompositions with added white noise and then treats the resultant mean as the final true result. The principle of EEMD is to add white noise, which populates the whole timefrequency space uniformly with the constituent components of different scales separated by a filter bank. The EEMD process is explained as follows: (1) Add a white noise series to the targeted data; (2) Decompose the data with added white noise into IMFs; (3) Repeat step 1 and step 2 again and again, but with different white noise series each time; (4) Obtain the (ensemble) means of corresponding IMFs of the decompositions as the final result.
The result of EEMD is obtained when the number in the ensemble approaches infinity (Equation (1)): in which c i,k (t) + αr k (t) is the k-th realization of the k-th IMF in the noise-added signal, α is the standard deviation of the added noise, and r(t) is the residual after extracting the first k IMF components. The number of the trials in the ensemble, N, has to be large.

Statistical Analysis
Data are expressed as the mean ± standard deviation (SD). The significance of differences in daily mean temperatures and events of stroke between Kaohsiung City and the Taipei Area were determined by the nonparametric Mann-Whitney U-test at * P < 0.05. Significant correlations between the daily mean temperatures and stroke events were evaluated by Spearman's rank correlation coefficients (P < 0.05). All statistical analyses were performed using STATA software (version 16.0 for Windows; STATA Corp. LLC, Lakeway Drive College Station, TX, USA).

Stroke Events and Ambient Temperatures in the Studied Areas
In total, 40,315 and 72,073 IS events were identified in Kaohsiung City and the Taipei Area, respectively, from the NHIRD. Table 1 shows that the event rates were 3.09 ± 0.09/1000/year and 2.25 ± 0.04/1000/year in Kaohsiung City and the Taipei Area, respectively. After adjustment for the numbers of permanent residents stratified by age (middle-aged and elderly groups), the IS events were 0.87 ± 0.06/1000/year in the middle-aged group and 12.86 ± 0.51/1000/year in the elderly group in Kaohsiung City, whereas the events were 0.63 ± 0.02/1000/year and 10.18 ± 0.17/1000/year in the same age groups in the Taipei Area. There were statistically significant differences in stroke events between these two areas for each age group. During the study period, 30,023 and 53,090 ICH events occurred in Kaohsiung City and the Taipei Area, respectively. The occurrence rates were 2.30 ± 0.12/1000/year and 1.66 ± 0.08/1000/year in Kaohsiung City and the Taipei Area, respectively. The occurrence rates were 1.25 ± 0.08/1000/year in the middle-aged group and 6.93 ± 0.80/1000/year in the elderly group in Kaohsiung City and 0.84 ± 0.03/1000/year and 5.65 ± 0.41/1000/year in the same age groups in the Taipei Area. There were also statistically significant differences between these two areas for each age group. The daily mean temperatures in Kaohsiung City were significantly higher than those in the Taipei Area, 25.38 ± 3.98 • C vs. 23.26 ± 5.55 • C. The daily temperature differences (the difference between the daily highest and lowest temperatures) were higher in the Taipei Area than those in Kaohsiung City, 5.85 ± 2.67 • C vs. 5.59 ± 1.64 • C. The coefficient variations of daily temperature difference were lower in Kaohsiung City than those in the Taipei Area, 29.0% vs. 45.6%.  Figure 2 shows the monthly stroke events during the six years. The events of IS were higher than those of ICH in the middle-aged and elderly groups in Kaohsiung City (a). Similar findings were also found in the Taipei Area (b). Moreover, there were faint peaks for ICH events in all age groups in both areas and IS events in the elderly group in the Taipei Area in the cold season around months 12, 24, 36, 48, 60, and 72. Figure 2 shows the monthly stroke events during the six years. The events of IS were higher than those of ICH in the middle-aged and elderly groups in Kaohsiung City (a). Similar findings were also found in the Taipei Area (b). Moreover, there were faint peaks for ICH events in all age groups in both areas and IS events in the elderly group in the Taipei Area in the cold season around months 12, 24, 36, 48, 60, and 72.

Association between Mean Ambient Temperature and Monthly Stroke Events
We used Spearman's rank correlation coefficients to assess the correlation between the monthly mean temperatures and stroke events. Table 2 shows significant correlations between the temperatures and ICH events in the middle-aged and elderly groups in both studied areas. However, there was only a significant correlation in the elderly IS group in the Taipei Area. Only one correlation coefficient of >0.5 was determined, which suggests a moderate correlation between the monthly mean temperatures and ICH events in the elderly group in the Taipei Area; nevertheless, there were several significant correlations between temperature and stroke events in both areas.

Decompositions of the Daily Mean Temperatures and Stroke Events
We used EEMD to decompose the daily mean temperatures and stroke events. For example, Figure 3 shows the IMFs of the daily mean temperatures (a) and events of ICH in the elderly group (b) in the Taipei Area. There were several obvious periodical signals

Association between Mean Ambient Temperature and Monthly Stroke Events
We used Spearman's rank correlation coefficients to assess the correlation between the monthly mean temperatures and stroke events. Table 2 shows significant correlations between the temperatures and ICH events in the middle-aged and elderly groups in both studied areas. However, there was only a significant correlation in the elderly IS group in the Taipei Area. Only one correlation coefficient of >0.5 was determined, which suggests a moderate correlation between the monthly mean temperatures and ICH events in the elderly group in the Taipei Area; nevertheless, there were several significant correlations between temperature and stroke events in both areas.

Decompositions of the Daily Mean Temperatures and Stroke Events
We used EEMD to decompose the daily mean temperatures and stroke events. For example, Figure 3 shows the IMFs of the daily mean temperatures (a) and events of ICH in the elderly group (b) in the Taipei Area. There were several obvious periodical signals for IMFs 4-8. To check the consistency of the periodical signals, we defined interpeak periods (IPP) as the intervals between two consecutive peaks. The consistency of IPPs was estimated by coefficient variance (CV) as (SD/mean) × 100%. Table 3 shows the IPPs of IMFs 4-8 of daily ambient temperatures and stroke events in these two studied areas. These data clearly show that the IPPs around 360 days had the lowest CV, except for IMF8 of IS in the middle-aged group and IMF7 for ICH in the elderly group in the Taipei Area. The six-period IMFs were presented in all decomposed data, as shown in Figure 4. After decomposing the original data, we used Spearman's rank correlation coefficients to examine the correlation between relevant IMFs, which consisted of similar IPPs of ambient temperatures and stroke events. For example, Figure 5 shows the relevant IMFs of ambient temperatures and events of ICH and the correlation coefficients of the IMFs in the elderly group in the Taipei Area. There was a high correlation coefficient between IMF6 of the daily mean temperatures and IMF7 of ICH events.

Temporal Association between the Relevant IMFs of Daily Mean Temperatures and Stroke Events
We used Spearman's rank correlation coefficients to assess the similarity of relevant IMFs of daily mean temperatures and stroke events in Kaohsiung City and the Taipei Area. Table 4 shows the significant correlations between all IMFs except IMF4 of the daily mean temperatures and IS events in the elderly group and IMF5 of the daily mean temperatures and IS events in the middle-aged group in Kaohsiung City. The correlation coefficients were >0.5 between all IMFs of temperature and ICH in all age groups and for the IS in the elderly group in both areas. Of note, the coefficients were higher in the Taipei Area than in Kaohsiung City.    Data of the peak-peak intervals are presented as mean ± SD (days) from IMFs 4 to 8 of the decomposed data. Correlation variance as (SD/mean) × 100% are quoted below the values. Temp: daily mean temperatures. * There are only two peaks with one interpeak interval of IMF8 of the daily mean temperature in Taipei Area.
(a) (b)        Table 1 shows typical differences in ambient temperatures between the tropical and subtropical areas, demonstrating higher daily mean temperatures and smaller daily temperature differences and variations in Kaohsiung City compared with the Taipei Area. Although Taiwan has a comprehensive NHIRD and is a unique geographic location, few studies have examined the impact of geographical differences on stroke incidence in Taiwan. Hu et al. reported that the age-adjusted first stroke incidence was 3.21 cases/1000 in Southern Taiwan and 2.95 cases/1000 in Northern Taiwan in a prospective cohort study of 8562 subjects from 1986-1990 [32]. In this study, we used events instead of the incidence or first-ever stroke as an indicator of stroke occurrence; therefore, our data were more relevant than those in previous studies, especially for the events in the elderly group. The interesting coincidental findings about temperature and stroke events found in these two studies may be not merely related to the climates but also attributed to the difference of risk factors of stroke such as hypertension, diabetes, sex, etc.

Discussion
Although several risk factors including age, sex, hypertension, and diabetes were not analyzed in the current study, our results clearly demonstrate that annual IS and ICH events were higher in Kaohsiung City than in the Taipei Area. Those are similar to previous hospital-based studies among the Chinese. Those showed that IS was more common than ICH among Chinese people. The IS:ICH ratio was 1.59 to 2.31 in community-based studies and much higher in hospital [33]. Our results demonstrated that the IS to ICH ratios were 1.34 and 1.36 in Kaohsiung City and the Taipei Area. After age stratification, the current study found that the ratios were 0.70 and 0.75 in the middle-aged groups and 1.86 and 1.80 in the elderly groups in Kaohsiung City and the Taipei Area, respectively. These results indicate that ICH events are more common than IS events in younger subjects. Similar results were found in a survey of a nationwide insurance database in South Korea [9]. The possible mechanism remains uncertain, and related articles are unavailable.
The association between ambient temperature and stroke incidence remains uncertain. A meta-analysis that recruited 19,736 stroke patients, including 14,199 IS and 3798 ICH, from 26 articles suggested that IS and ICH have a negative correlation with ambient temperature [16]. Although Figure 2 shows a tendency for a reverse relationship between the monthly ambient temperature and stroke events, there was a mild to moderate correlation between ICH events and ambient temperature. Only IS occurrences had a strongly reverse association with ambient temperature in the elderly group in the Taipei Area by Spearman's rank correlation coefficient ( Table 2). The seasonality of the occurrences of IS was not demonstrated in the NHIRD during the period 1998 to 2003 [34].
Our raw data show the daily events of stroke were around 0-30 events/day with irregularity. EEMD is a powerful analytic tool when dealing with >2000 tiny irregular data points after adding adequate white noise. This method can isolate and extract physically meaningful IMFs from the original signals. For example, EEMD demonstrated the El Niño-southern oscillation events by decomposing sea-level atmospheric pressure and temperature (Wu and Huang, 2009) [31]. This method was also used to study the association between environmental or meteorological factors and monthly suicide rates or headaches [35], or to predict COVID-19 endemics [36].
In the current study, for example, the EEMD decomposed the daily mean temperatures and events of ICH of the middle-aged group in the Taipei Area into several IMFs. There were 6-period IMF6 and IMF7 in the decomposed data, respectively (Figure 3). Although EEMD can decomposed intermixed oscillating intermixed signals into several mono-component intrinsic mode functions, the periods of IMFs of temperature and stroke events in all groups are not consistent ( Table 3). The meaning of each IMF is not certain, except the IMF1, which is the added white noise. We had found high-frequency rhythm in IMF2 and IMF3, not shown in Table 3 and the text. These high-frequency IMFs may be the daily or weekly periodic changes of the signals. However, they are not consistent and do not have good correlation between each the ambient temperature and stroke events. There are 6-period IMFs at intervals of around 360 days, which were the most consistent decomposed IMFs of all the raw data in both studied areas (Table 3, Figure 4). As an example, these 6-period IMFs of daily events of ICH had good correlation coefficients with the daily mean temperatures in the Taipei Area ( Figure 5). Table 4 summarizes the correlation coefficients of decomposed IMFs of daily stroke events with the IMF of mean temperatures with similar intervals in Kaohsiung City and the Taipei Area. There were reverse associations between daily stroke events and mean temperatures. These were more prominent between the daily events of ICH in any age group and the daily occurrences of IS in the elderly group in both studied areas. Similar results were found in a meta-analysis, which revealed that elderly subjects had an increased incidence of IS and ICH in cold weather [15].
There were statistically significant correlations between temperature and stroke events in most groups. Moderate-to-strong correlations, as coefficient > 0.5 [37], with temperature were found in the 5-6-period IMFs of decomposed ICH in both age groups and IS in the elderly group in Kaohsiung City and the Taipei Area by Spearman's rank correlation. Of note, the correlations were stronger in the Taipei Area (subtropical area) than in Kaohsiung City (tropical area). A similar method was applied to study the spatial-temporal association of environmental factors, including temperature, soil moisture, and photosynthetically active radiation [38].
Our data show obvious correlations between ambient temperature and stroke events, which were more prominent in the subtropical area. These results differ from those in the meta-analysis study, which suggested that the associations decreased with latitude [16]. Unlike the meta-analysis, the current study used a cohort in two areas at different latitudes and climate zones from a big dataset of same healthcare system. Our results were more relevant to the situation in the real world. Nevertheless, both studies demonstrated that associations between ambient temperature and stroke incidence were more apparent in ICH.
A limitation of the current study was that the risk factors of stroke, which were not included in our initial data mining, were not analyzed. We propose that remote risk factors may not influence the dynamic daily changes of stroke events. Therefore, our study showed a clear temporal association between stroke events and ambient temperature in tropical and subtropical areas. Meanwhile, we could not analyze the differences of risk factors between these two studied areas. Those may be confounding or major factors influencing the differences of the associations between Kaohsiung City and Taipei Areas. The data of risk factors were not available in our original databank, and relevant articles are not available in the literature either. Nevertheless, based on the nationwide equal healthcare systems and similar age distribution between these two studied areas, our study still offered a naval and useful information about the possible influences of altitude on the association between ambient temperature and stroke events as compared with the current meta-analyses.

Conclusions
Our study of the nationwide insurance claim data shows that the events of stroke have a tendency to be reversely associated with ambient temperature. The correlations became obvious after the nonlinear decomposition by EEMD. These may suggest that cold weather is a risk factor of stroke events. The method to demonstrate these temporal associations can hopefully promote further studies about similar dynamic associations between environmental factors and occurrences of stroke, myocardial infarction, endemic and pandemic diseases, etc. Meanwhile, the correlation coefficients were higher in the subtropical area than those in the tropical area. The importance of 360 day IMFs of stroke events and possible influence of latitude or climate on daily stroke events has not been reported in the past. These offer potential research prospects on the temporal and geographic associations between environmental factors and disease events by using the unique NHIRD.