Differences in Spontaneous Intracerebral Hemorrhage Cases between Urban and Rural Regions of Taiwan: Big Data Analytics of Government Open Data

This study evaluated the differences in spontaneous intracerebral hemorrhage (sICH) between rural and urban areas of Taiwan with big data analysis. We used big data analytics and visualization tools to examine government open data, which included the residents’ health medical administrative data, economic status, educational status, and relevant information. The study subjects included sICH patients of Taipei region (29,741 cases) and Eastern Taiwan (4565 cases). The incidence of sICH per 100,000 population per year in Eastern Taiwan (71.3 cases) was significantly higher than that of the Taipei region (42.3 cases). The mean coverage area per hospital in Eastern Taiwan (452.4 km2) was significantly larger than the Taipei region (24 km2). The residents educational level in the Taipei region was significantly higher than that in Eastern Taiwan. The mean hospital length of stay in the Taipei region (17.9 days) was significantly greater than that in Eastern Taiwan (16.3 days) (p < 0.001). There were no significant differences in other medical profiles between two areas. Distance and educational barriers were two possible reasons for the higher incidence of sICH in the rural area of Eastern Taiwan. Further studies are necessary in order to understand these phenomena in greater depth.


Introduction
Spontaneous intracerebral hemorrhage (sICH) incurs a high medical expenditure, especially emergency and intensive care facilities [1,2]. Meanwhile, many sICH patients have multiple chronic conditions [2,3]. Both hypertension and diabetes mellitus are found to be the most frequent comorbidities of the sICH patients before being attacked [2,3]. Continuity of care for the patients with the comorbidities will decrease the risk of sICH [3]. It is the key performance index for the quality of care for chronic diseases. Disparity in health care facilities between rural areas and urban areas is huge [4][5][6]. There are significantly different incidences in some diseases and public health problems in rural area due to different health care accessibilities. That the rural areas have fewer health care facilities and this fact will influence the outcome of health care [6]. Socioeconomic status and Int. J. Environ. Res. Public Health 2017, 14, 1548 2 of 9 educational status also influence health care accessibility and the outcome of health care [5,7,8]. Stroke patients in China of a lower income status, lower educational status and rural location were found to have higher mortality rates [7]. Thus, it is important to analyze the different status in sICH between rural and urban regions with big data.
The concept of big data has been defined as four Vs: Large amount of data (volume), diversity of the structure of the data (variety), quick data access and data management (velocity), and the quality of being true and real-world data (veracity) [9]. Many researchers used administrative data related to health care [1]. However, few researchers have integrated these data with other government open data. This study integrated Taiwan National Health Insurance data [10][11][12] which covered 99% of Taiwan's population (about 23 million residents) and government open data which including household registration database of the Department of Household Registration (http://www.ris.gov. tw/en/web/ris3-english/home) [13], the 2010 population and housing census (http://ebas1.ebas.gov. tw/phc2010/chinese/rchome.htm) [14], smoker statistics data and Taiwan Geographic Information Systems (GIS) data (https://data.gov.tw/) [15] and using big data analytics and visualization tools to evaluate the status of sICH in rural and urban areas of Taiwan.

Data Source
This study integrated the National Health Insurance Research Database (NHIRD), the household registration database of the Department of Household Registration, the 2010 population and housing census, and the Government Open Data Platform in Taiwan with big data analytics systems using the platform of the Innovation Center for Big Data and Digital Convergence.

Data Protection and Permission
The personal information of all subjects was encrypted using a double scrambling protocol for research purposes to protect patient privacy. All researchers who wish to use the NHIRD and its data subsets are required to sign a written agreement declaring that they have no intention of obtaining information that could potentially violate the privacy of patients or care providers. This study was approved by the Institutional Review Board (IRB) of the Taipei Hospital (IRB Approval Number: TH-IRB-0015-0003), and the protocol was evaluated by the National Health Research Institutes (NHRI), which consented to this planned analysis of the NHIRD (Agreement Number: NHIRD-104-183).

Data Management
The inclusion criterion in this study was patients with first-attack sICH, identified by a principal diagnosis code in the International Classification of Diseases 9th version (ICD-9) of 431. There were 128,173 sICH cases registered from 2001 to 2011. Patients admitted due to traumatic intracranial hemorrhage (TICH) (ICD-9 codes: 800 to 804.99, 850 to 854.19, 959.01, and 959.09) were excluded [2], resulting in 1753 cases being excluded from this study. The Taipei region (Taipei City and New Taipei City, 29,741 cases) and Eastern Taiwan (Taitung and Hualian, 4565 cases) were included in this study ( Figure 1).
Information on patients' profiles and hospital admission expenditures were collected. A prolonged ICU stay of sICH patients was defined as admission to ICU for more than 10 days [1]. The Charlson Comorbidity Index (CCI) was used to evaluate the severity of a patient's condition [16,17].

Statistics and Data Analysis
The Student t-test was used to analyze the continuous data, and the χ 2 test was used for the categorical data. Statistical analysis was conducted using SPSS version 12.0 (SPSS Inc., Chicago, IL, USA). The big data analytics and visualization tools were constructed at the Innovation Center for Big Data and Digital Convergence, Yuan Ze University. Statistical significant was set at p < 0.05.

Results
Taipei City and New Taipei City (Taipei region) are urban areas in Taiwan. Taitung and Hualian (Eastern Taiwan) are rural areas in Taiwan ( Figure 2). The low-income resident proportion in Eastern Taiwan (3.1%) was twice that in the Taipei region (1.3%). Regarding educational status, the proportion of residents with a university/college level of education in Eastern Taiwan (25.5%) was significantly lower than that in the Taipei region (46.7%) ( Table 1). The mean coverage area per hospital in Eastern Taiwan (452.4 km 2 ) was significantly larger than that in the Taipei region (24 km 2 ) (p < 0.001). The total numbers of beds and ICU beds in the Taipei region (33,570 beds/1861 ICU beds) were significantly greater than those in Eastern Taiwan (5588 beds/238 ICU beds) (p < 0.001). The same was true for the medical staffing level: The numbers of medical staff and physicians in the Taipei region (44,556/8529 persons) were also significantly higher than in Eastern Taiwan (4555/733 persons) (p < 0.001) ( Table 2).
If discussing permanent residents only, there were further differences between Eastern Taiwan and the Taipei region. The numbers of hospital beds/ICU beds per 10,000 population in Eastern Taiwan (109.4/4.7 beds) were significantly greater than those in the Taipei region (50/2.8 beds) (p < 0.001). Interestingly, the number of medical staff in Eastern Taiwan (78.3 staff per 10,000 population) was significantly higher than that in the Taipei region (69.7 staff per 10,000 population) (p < 0.001). Taking only permanent residents into account, the number of medical staff in Eastern Taiwan (89.1 staff per 10,000 population) was still significantly greater than that in the Taipei region (66.4 staff per 10,000 population) (p < 0.001) and the number of physicians per 10,000 population in Eastern Taiwan (14.3 physicians per 10,000 population) was significantly higher than that in the Taipei region (12.7 physicians per 10,000 population) (p < 0.01) ( Table 2).

Statistics and Data Analysis
The Student t-test was used to analyze the continuous data, and the χ 2 test was used for the categorical data. Statistical analysis was conducted using SPSS version 12.0 (SPSS Inc., Chicago, IL, USA). The big data analytics and visualization tools were constructed at the Innovation Center for Big Data and Digital Convergence, Yuan Ze University. Statistical significant was set at p < 0.05.

Results
Taipei City and New Taipei City (Taipei region) are urban areas in Taiwan. Taitung and Hualian (Eastern Taiwan) are rural areas in Taiwan ( Figure 2). The low-income resident proportion in Eastern Taiwan (3.1%) was twice that in the Taipei region (1.3%). Regarding educational status, the proportion of residents with a university/college level of education in Eastern Taiwan (25.5%) was significantly lower than that in the Taipei region (46.7%) ( Table 1). The mean coverage area per hospital in Eastern Taiwan (452.4 km 2 ) was significantly larger than that in the Taipei region (24 km 2 ) (p < 0.001). The total numbers of beds and ICU beds in the Taipei region (33,570 beds/1861 ICU beds) were significantly greater than those in Eastern Taiwan (5588 beds/238 ICU beds) (p < 0.001). The same was true for the medical staffing level: The numbers of medical staff and physicians in the Taipei region (44,556/8529 persons) were also significantly higher than in Eastern Taiwan (4555/733 persons) (p < 0.001) ( Table 2).
If discussing permanent residents only, there were further differences between Eastern Taiwan and the Taipei region. The numbers of hospital beds/ICU beds per 10,000 population in Eastern Taiwan (109.4/4.7 beds) were significantly greater than those in the Taipei region (50/2.8 beds) (p < 0.001). Interestingly, the number of medical staff in Eastern Taiwan (78.3 staff per 10,000 population) was significantly higher than that in the Taipei region (69.7 staff per 10,000 population) (p < 0.001). Taking only permanent residents into account, the number of medical staff in Eastern Taiwan (89.1 staff per 10,000 population) was still significantly greater than that in the Taipei region (66.4 staff per 10,000 population) (p < 0.001) and the number of physicians per 10,000 population in Eastern Taiwan (14.3 physicians per 10,000 population) was significantly higher than that in the Taipei region (12.7 physicians per 10,000 population) (p < 0.01) ( Table 2).     The total numbers of sICH cases in the Taipei region and Eastern Taiwan were 29,741 (male/female = 62.5%/37.5%) and 4565 cases (male/female = 63.7%/36.3%), respectively. However, the incidence per 100,000 population per year in Eastern Taiwan (71.3 cases per 100,000 population per year) was significantly higher than that in the Taipei region (42.3 cases per 100,000 population per year). There were no significant differences in the mean ages of both the male and female sICH patients between the two areas ( Table 3). The incidence of sICH increased with ages; however, the increasing trend of incidence was more significant in Eastern Taiwan than in the Taipei region in both the male and female populations (p < 0.001) (Figure 3). The total numbers of sICH cases in the Taipei region and Eastern Taiwan were 29,741 (male/female = 62.5%/37.5%) and 4565 cases (male/female = 63.7%/36.3%), respectively. However, the incidence per 100,000 population per year in Eastern Taiwan (71.3 cases per 100,000 population per year) was significantly higher than that in the Taipei region (42.3 cases per 100,000 population per year). There were no significant differences in the mean ages of both the male and female sICH patients between the two areas ( Table 3). The incidence of sICH increased with ages; however, the increasing trend of incidence was more significant in Eastern Taiwan than in the Taipei region in both the male and female populations (p < 0.001) (Figure 3).  Because veteran patients are of a relatively low economic status in Taiwan, this study aggregated both low-income patients and veteran patients. The proportion of low-income and veteran sICH patients in Eastern Taiwan (10.6%) did not differ significantly from that in the Taipei region (8.5%). The Charlson Comorbidity Index (CCI) was also calculated in this study, and there was no significant difference in the CCI score between the two areas (Table 3).
Regarding the usage of medical facilities and medical expenditure, the mean hospital length of stay (LOS) in the Taipei region (17.9 days, SD = 15.13) was significantly longer than that in Eastern Taiwan (16.3 days, SD = 13.82) (p < 0.001), but there was no significant difference in the intensive care unit (ICU) LOS between the Taipei region (10.2 days, SD = 17.13) and Eastern Taiwan (8.4 days, SD = 10.60). This study also found that there was no significant difference in the proportion of patients with a prolonged ICU stay (35.4% vs. 26.9%), the brain surgical rates (24.9 % vs. 29.3%) and total hospital expenditure (US$4674 vs. US$4873) between the Taipei region and Eastern Taiwan. This study also examined the sub-classifications of hospital expenditure. The sICH patients in the Taipei region (US$253, SD = 212.4, US$277, SD = 380.3 and US$740, SD = 2536.7) had higher medical expenditures than those in Eastern Taiwan (US$235, SD = 202.2, US$225, SD = 313 and US$723, SD = 1358.1) in terms of doctor fees (p < 0.001), examination fees (p < 0.05) and medication fees (p < 0.001) ( Table 3). Table 3. Data for intracranial hemorrhage (sICH) patients in the Taipei region (urban area) and Eastern Taiwan (rural area).

Discussion
Big data analytics is a modern method used to increase knowledge; this method of integrating multi-scale large data from different sources will revolutionize the health care system [9,18]. Visualization analysis, a big data analytics tool, can unveil hidden knowledge from big data [19]. The NHIRD (National Health Insurance Research Database) has therefore grown to be a massive and rich database for researchers. When this structured huge database is integrated with other unstructured government open data, it becomes an even more useful big data tool, and researchers are optimistic regarding the discovery of new knowledge in the area of health care that will enable improvement in the quality of health care. This study focused on sICH patients, integrating Taiwan National Health Insurance data and other government open data. The geographical distribution of the sICH patients was demonstrated using the visualization tool of big data analytics, and we found that the incidence of sICH in Eastern Taiwan was significantly higher than that in other regions (Figure 2). This result inspired us to examine the possible reasons behind this finding.
The first possible reason identified in this study that influences the incidence of sICH was the distance barrier. Schoeps et al. [20] found that geographic distance influences the accessibility of health care and affects the survival of children. Rural regions also have disparities in chronic care, surgical intervention abilities and emergency care [21]. This study found that the mean coverage area per hospital in Eastern Taiwan (452.4 km 2 ) was almost 20 times that in the Taipei region (24 km 2 ). Hypertension would be better controlled to prevent sICH and other associated complications if health care facilities were more accessible [22]. Due to the high density of hospitals, patients with hypertension or other comorbidities that influence sICH in urban areas will have easier access to health care facilities than patients in rural areas. This may explain the higher incidence of sICH in Eastern Taiwan.
Looking at the issue from another perspective, Eastern Taiwan is a longitudinal valley. The East Rift Valley is a long, narrow valley flanked by the Central Mountain Range to the west and the Coastal Mountain Range to the east. It became the barriers of the accessibility of the health care facilities. For example, if a sICH patient in Xiufeng Township, at the northern pole of Hualien County, visited a medical centre in Eastern Taiwan, the journey would be 58.6 km as calculated using Google maps, and would take one hour and 20 min by car. Some stroke patients are unable to reach hospitals soon enough for necessary ultra-early treatment [6]. Although the numbers of medical staff and physicians per 10,000 population did not differ significantly between the two regions, patients in Eastern Taiwan have a lower accessibility to health care facilities due to the distance barrier. The incidence of sICH is therefore increased due to poor care of chronic conditions. Another reason that influences the incidence of sICH is patient education. Education improves health care-seeking behavior [5]. A low-income status, lower educational status and rural location influence health care accessibility and treatment outcome, [7] and health education could improve patients' self-care abilities [23]. This study found that the patients in Eastern Taiwan had a lower educational status than those in the Taipei region. For future improvement in the quality of health care, the government should invest more in patient education.
This study had some limitations. First, the Taiwan National Health Insurance data were not real-time data, and patients and conditions could only be identified retrospectively. However, this study presented a feasible healthcare big data model for the integration of administrative data and other government open data. These models can be used to support the government's medical decision-making. This study also found a significantly higher incidence of sICH in Eastern Taiwan using visualization tools than in other regions.

Conclusions
This study is a good example of the use of big data analysis by taking advantage of data visualization tools to guide the research direction. Second, the administrative data did not exactly mirror the real clinical data. For example, we may use the patients' administrative data to calculate what kind of drugs they intake, but how are the status of their medication adherence still remain unclear. This is a significant limitation, as we were unable to calibrate the severity of illness and compare the outcome between groups. Although this study calibrated the severity of illness using the Charlson Comorbidity Index (CCI), we were only able to present the outcome and severities of patients using indirect data. In future studies, we plan to conduct sequential pattern analysis of administrative and other government open data in order to follow-up cases to the final outcome for further evaluation.