Does the Short-Term Effect of Air Pollution Influence the Incidence of Spontaneous Intracerebral Hemorrhage in Different Patient Groups? Big Data Analysis in Taiwan

Spontaneous intracerebral hemorrhage (sICH) has a high mortality rate. Research has demonstrated that the occurrence of sICH is related to air pollution. This study used big data analysis to explore the impact of air pollution on the risk of sICH in patients of differing age and geographic location. 39,053 cases were included in this study; 14,041 in the Taipei region (Taipei City and New Taipei City), 5537 in Taoyuan City, 7654 in Taichung City, 4739 in Tainan City, and 7082 in Kaohsiung City. The results of correlation analysis indicated that there were two pollutants groups, the CO and NO2 group and the PM2.5 and PM10 group. Furthermore, variations in the correlations of sICH with air pollutants were identified in different age groups. The co-factors of the influence of air pollutants in the different age groups were explored using regression analysis. This study integrated Taiwan National Health Insurance data and air pollution data to explore the risk factors of sICH using big data analytics. We found that PM2.5 and PM10 are very important risk factors for sICH, and age is an important modulating factor that allows air pollutants to influence the incidence of sICH.


Introduction
It is well-known that pollution influences health, including cardiovascular diseases, cerebrovascular diseases, pulmonary diseases, and some other diseases [1][2][3]. Evidence has indicated that air pollutants including particulate matter (PM), ozone (O 3 ), nitrogen dioxide (NO 2 ), sulfur dioxide (SO 2 ) and carbon monoxide (CO) influence health [4,5]. Spontaneous intracerebral hemorrhage (sICH) accounts for 10-35% of stroke patients; the incidence ranges from 10 to 60 cases per 100,000 populations per year, with a high mortality rate [6][7][8]. Research has indicated that air pollution is correlated with the incidence of and hospital admissions due to stroke, but different conclusions have been reached by different research groups with regards to the correlations between sICH and air pollutants [4,[9][10][11][12]. The Taiwan National Health Insurance Research Database (NHIRD) is a useful longitudinal dataset for financial and epidemiological research. Many studies have used the NHIRD for exploration of medical issues [8,[13][14][15]; however, few studies have integrated the NHIRD with other datasets. The dataset used in this study included outpatient and inpatient claims data, with detailed longitudinal information for each visit/stay [16]. This study integrated the NHIRD and governmental open data using big data analysis methods and evaluated the correlations between air pollutants and sICH in Taiwanese patients in short-term effect (exposure to air pollution over one to 24 h).

Data Sources
This study integrated the National Health Insurance Research Database (NHIRD), the household registration database of the Department of Household Registration, the 2010 population and housing census, and the air pollutants data derived from the government open data platform in Taiwan with big data analytics systems using the platform of the Innovation Center for Big Data and Digital Convergence, Yuan Ze University [16,17].

Data Protection and Permission
The personal information of all subjects was encrypted using a double scrambling protocol for research purposes to protect patient privacy. All researchers who wish to use the NHIRD and its data subsets are required to sign a written agreement declaring that they have no intention of obtaining information that could potentially violate the privacy of patients or care providers. This study was approved by the Institutional Review Board (IRB) of Taipei Hospital (IRB Approval Number: TH-IRB-0015-0003), and the protocol was evaluated by the National Health Research Institutes (NHRI), which consented to this planned analysis of the NHIRD (Agreement Number: NHIRD-104-183).

Data Management
The inclusion criterion for patients in this research was first-attack sICH, identified by a principal diagnosis code of the International Classification of Diseases 9th version (ICD-9) of 431. The five regions analyzed were the Taipei region (Taipei City and New Taipei City), Taoyuan City, Taichung City, Tainan City and Kaohsiung City. In total, 42,360 sICH cases were registered from 2007 to 2011 in the five regions. Patients who were admitted due to traumatic intracranial hemorrhage (TICH) (ICD-9 codes 800.00 to 804.99, 850.00 to 854.19, 959.01, and 959.09) were excluded (3307 cases) [8], and the remaining 39,053 cases were then analyzed in this study. Data of the six air pollutants extracted from the Taiwan government open data platform, were also cleaned and merged, including data for carbon monoxide (CO), nitrogen dioxide (NO 2 ), ozone (O 3 ); particulate matter (PM) 10 µm and 2.5 µm, and sulfur dioxide (SO 2 ). Because of certain sensors were broken, there were some missing data. To manage the missing data, we used linear interpolation method, and then the average concentrations of each air pollutant in the 5 regions were calculated as the daily concentrations. These data were merged with observed date of each air pollutant and patients' admission date. Finally, we analyzed those dataset (Section 2.4). A flow chart of data management in this study is as shown in Figure 1. The cut-off values indicating abnormal pollution levels, mean concentrations, and number of days on which the levels of pollutants exceeded normal levels in the five regions were also collected.

Statistics
Patient characteristics including gender, age, incidence of sICH, % of patients on a low income, Charlson Comorbidity Index (CCI) [18,19] and total hospital length of stay (LOS) were collected by region. The Pearson correlation coefficient and stepwise regression statistics were used to examine the relationships between data. In this study, the degree of correlation was defined as follows: high correlation was defined as a correlation coefficient larger than 0.6; moderate correlation was defined as a correlation coefficient between 0.3 and 0.6; and low correlation was defined as a correlation coefficient lower than 0.3. Statistical analysis was conducted using SPSS version 19.0 (SPSS Inc., Chicago, IL, USA). The big data analysis and visualization tools were constructed by the Innovation Center for Big Data and Digital Convergence, Yuan Ze University [20]. Statistical significance was defined as p < 0.05.

Results
The correlations between the incidence of sICH and air pollutant levels in five regions of Taiwan were examined. 39,053 cases were included in this study, 14,041 in the Taipei region (Taipei City and

Statistics
Patient characteristics including gender, age, incidence of sICH, % of patients on a low income, Charlson Comorbidity Index (CCI) [18,19] and total hospital length of stay (LOS) were collected by region. The Pearson correlation coefficient and stepwise regression statistics were used to examine the relationships between data. In this study, the degree of correlation was defined as follows: high correlation was defined as a correlation coefficient larger than 0.6; moderate correlation was defined as a correlation coefficient between 0.3 and 0.6; and low correlation was defined as a correlation coefficient lower than 0.3. Statistical analysis was conducted using SPSS version 19.0 (SPSS Inc., Chicago, IL, USA). The big data analysis and visualization tools were constructed by the Innovation Center for Big Data and Digital Convergence, Yuan Ze University [20]. Statistical significance was defined as p < 0.05.

Results
The correlations between the incidence of sICH and air pollutant levels in five regions of Taiwan were examined. 39,053 cases were included in this study, 14,041 in the Taipei region (Taipei City and New Taipei City), 5537 in Taoyuan City, 7654 in Taichung City, 4739 in Tainan City and 7082 in Kaohsiung City. The lowest incidence of sICH was found in the Taipei region (216.4 cases per 100,000 populations per year). The incidences of sICH in both Taoyuan City (280.0 cases per 100,000 populations per year) and Taichung City (290.4 cases per 100,000 populations per year) were higher than those of the other cities for both male and female patients. There were no significant differences in terms of mean age or Charlson Comorbidity Index (CCI) between patients from the different regions (Table 1). The cut-off points indicating abnormal levels of carbon monoxide (CO) and nitrogen dioxide (NO 2 ) were 250 ppb and 35 ppm, respectively. Neither the mean concentrations nor the number of days on which the criteria for normality were exceeded for CO and NO 2 were observed to be abnormal in any of the 5 regions/cities investigated in this study. The abnormal level cut-off point for ozone (O 3 ) was 120 ppb. Although the mean concentration of O 3 was within the normal limit, there were still more than 300 days within the 5-year study period on which the cut-off level was exceeded in all regions assessed in this study. The city with the lowest level of O 3 was Taoyuan City, the cut-off level being exceeded on only 356 days during the 5-year study period. The city with the highest level of O 3 was Kaohsiung City, for which 1035 days in the 5-year study period exceeded the level indicating abnormal O 3 pollution. The cut-off level indicating an abnormal particulate matter (PM) 10 µm (PM 10 ) pollution level was 125 µg/m 3 . It was found that the mean concentration of PM 10 was within the normal limit in all 5 regions/cities; however, in Tainan City (127 days in five years) and Kaohsiung City (183 days in five years), the cut-off limit for the PM 10 concentration was exceeded on more than 100 days within the 5-year study period. The cut-off level indicating an abnormal PM 2.5 µm pollution level was 35 µg/m 3 . The records for the Taipei region (26.8 µg/m 3 ) and Taoyuan City (28.0 µg/m 3 ) showed that these regions were within the normal limits in terms of the mean concentration of PM 2.5 . However, in 3 regions/cities, the cut-off level indicating abnormal pollution was exceeded on more than 730 days (2 years) within the 5-year study period. The levels in Taipei and Taoyuan City exceeded the cut-off point on fewer days (i.e., less than 2 years), but the normal range of PM 2.5 was still exceeded on more than 400 days. The cut-off level indicating an abnormal amount of sulfur dioxide (SO 2 ) was 100 ppb. The mean concentrations of SO 2 were within the normal limit in all 5 regions/cities (Table 2).  Regarding correlations between air pollutants, two groups were found to have extremely high correlations: the CO and NO 2 group (correlation coefficient = 0.939, p < 0.01) and the PM 10 and PM 2.5 group (correlation coefficient = 0.969, p < 0.001). All other pollutants had high/moderate correlations with PM 10 and PM 2.5 . CO had moderate correlations with PM 10 (correlation coefficient = 0.399, p < 0.01) and PM 2.5 (correlation coefficient = 0.463, p < 0.001). NO 2 had high correlations with PM 10 (correlation coefficient = 0.610, p < 0.001) and PM 2.5 (correlation coefficient = 0.637, p < 0.001) and a moderate correlation with SO 2 (correlation coefficient = 0.422, p < 0.001). O 3 had moderate correlations with PM 10 (correlation coefficient = 0.372, p < 0.01) and PM 2.5 (correlation coefficient = 0.343, p < 0.01). SO 2 had moderate correlations with PM 10 (correlation coefficient = 0.550, p < 0.001) and PM 2.5 (correlation coefficient = 0.521, p < 0.001) ( Table 3).
In this study, the co-factors related to the influences of air pollutants on the incidence of sICH in different age groups were evaluated using regression analysis. For the extremely high correlations of two groups described above. We erased the PM 10 and CO. Three models were used to examine the correlations of the monthly incidence of sICH with air pollutants. Model 1 included only PM 2.5 ; the adjusted R 2 of Model 1 was 0.417. Although the two other models also had satisfactory adjusted R 2 values (0.459 and 0.490), it was not logical that the two other factors (O 3 and SO 2 ) had a negative influence, as this would mean that the higher the concentration of these air pollutants, the lower the monthly incidence of sICH. In addition, the F values of the change in R 2 were lower than that of Model 1. Therefore, Models 2 and 3 were suspended (Table 4).   In terms of age, there were no significant correlations between the air pollutants and the monthly incidence of sICH in patients under 25 years of age, and only one factor (NO 2 ) that was included in the regression model was significantly correlated with the monthly incidence of sICH in patients aged between 25 and 44 years; however, the adjusted R 2 (0.101) was too low to be accepted. Regarding the regression models that included patients aged between 45 and 64 years, the same problems were experienced for Models 2, 3 and 4, in that O 3 , SO 2 and NO 2 were shown to have negative influences. Only Model 1, which contained only PM 2.5 , was acceptable, with a satisfactory adjusted R 2 (0.474). For analysis of patients aged over 80 years, only NO 2 was included in the regression model, the adjusted R 2 of which was 0.211 (Table 5).

Discussion
In recent studies, it has been found that PM 2.5 and PM 10 have great impacts on human health, especially PM 2.5 [21]. However, the conclusions reached with regards to the correlations of ambient PM 2.5 and PM 10 concentrations with the risk of sICH were inconsistent [4,9,[22][23][24]. Tsai et al. [4] concluded that air pollutants (PM 10 , O 3 , NO 2 , SO 2 and CO) are highly correlated with sICH admissions in warm weather (environment temperature > 20 • C). Xiang et al. [9] evaluated air pollutants NO 2 , SO 2 and PM 10 , and concluded that no significant correlations existed between the air pollutants and sICH admissions in warm weather, but reported that NO 2 is significantly associated with stroke during cold weather. A meta-analysis study of ambient particulate matter levels concluded that PM 2.5 and PM 10 have no influence on the risk of sICH in patients of any age [22]. As the concentrations of PM 2.5 and PM 10 were found to be very highly correlated in the present study, it was concluded that PM 2.5 and PM 10 are important risk factors for sICH. In addition, again owing to there being significant correlations between PM 10 and PM 2.5 , a regression model was conducted in this study to predict the correlation between the number of cases of sICH per month and PM 2.5 (R 2 = 0.417).
Age was found to be an important modulating factor of the effect of air pollutants on the incidence of sICH. Different levels of influence of air pollution on the risk of sICH were observed in different age groups. We found high correlations between sICH and PM 10 , PM 2.5 and NO 2 in the middle-aged and elderly patient groups. The concentration of NO 2 was also correlated with the incidence of sICH in patients aged between 25 and 44 years in this study. We adjusted the factors using regression analysis in order to prevent internal correlations between factors. Previous research has resulted in inconsistent conclusions regarding the influence of NO 2 on the risk of sICH [4,[10][11][12]23,24]. In this study, when other air pollutant factors were controlled, NO 2 was found to influence the risk of sICH in patients aged over 80 years (R 2 = 0.211). We also found that the ambient levels of air pollutants did not influence the risk of sICH in younger patients (<44 years of age). PM 2.5 and PM 10 were found to be very important risk factors for sICH in middle-aged and elderly sICH patients (45-79 years of age).
Previous study has shown that air pollutants are highly correlated with the incidence of stroke in Taiwan [4,23,24]. In this study, it was found that on no days did the levels of CO, NO 2 and SO 2 exceed normal levels in any of the regions examined (Table 2). We believe that this is good evidence of pollution control by the Taiwan government in the modern era. However, we found that NO 2 pollution was still highly correlated with the incidence of sICH, and the number of days on which the levels of PM 10 , PM 2.5 and O 3 exceeded normal levels remained high. Moreover, the mean concentration of PM 2.5 in some cities was higher than the cut-off level, indicating abnormally high levels. This raises the question as to whether the cut-off levels for air pollutants at which the level is considered abnormal are too high to prevent influences of the pollutants on the incidences of diseases. The air pollution control policies and cut-off levels may therefore need to be readjusted following further in-depth research.
We found most of the air pollutants had middle to high correlation, except O 3 . The possible reason was that O 3 is formed from hydrocarbons and nitrogen oxides reactive with sunlight and it may spread to many kilometers by wind, but the others pollutants are produced by car engines or by industrial operations. Most of the air pollutants in this study were correlated with PM 10 and PM 2.5 . Owing to most ambient particulate matter being a heterogeneous mixture of various compounds, such as organic and elemental carbon, metals, sulfates, nitrates, and some microorganisms [22], PM 2.5 and PM 10 are expected to be highly correlated with the levels of other pollutants. We therefore adjusted these correlations using regression analysis.
The limitations of this study included that there were differences in the geographic positioning of air pollutant detection stations and the locations of sICH patients when they suffered an attack, and the time frames of pollutant measurement and the occurrence of sICH also differed. Other researchers also identified a time lag [23], in that patients were perhaps not sent to hospital immediately after suffering sICH.

Conclusions
As sICH is an emergency condition for which most patients are sent to the ER at once, the results of this study can be considered reliable. This is a pilot study for the fast reaction effect of these pollution factors on sICH. We found of the air pollutions still influence the incidence of sICH. This study found that NO 2 pollution was still highly correlated with the incidence of sICH, and the number of days on which the levels of PM 10 , PM 2.5 and O 3 exceeded normal levels remained high. In addition, age was found to be an important modulating factor of the effect of air pollutants on the incidence of sICH. There are high correlations between sICH and PM 10 , PM 2.5 and NO 2 in the middle-aged and elderly patient groups. Furthermore, PM 2.5 and PM 10 were found to be very important risk factors for sICH in middle-aged and elderly sICH patients. The reason is still unclear and will need to be further investigated. In the future, laboratory data and temperature may be included, and multivariate analysis can be used in order to detect the influences of air pollutants more precisely. Air pollution control policies and cut-off levels may therefore need to be readjusted following further in-depth research.