Hybrid Deep Learning Algorithm with Open Innovation Perspective: A Prediction Model of Asthmatic Occurrence

: Due to recent advancements in industrialization, climate change and overpopulation, air pollution has become an issue of global concern and air quality is being highlighted as a social issue. Public interest and concern over respiratory health are increasing in terms of a high reliability of a healthy life or the social sustainability of human beings. Air pollution can have various adverse or deleterious e ﬀ ects on human health. Respiratory diseases such as asthma, the subject of this study, are especially regarded as ‘directly a ﬀ ected’ by air pollution. Since such pollution is derived from the combined effects of atmospheric pollutants and meteorological environmental factors, and it is not easy to estimate its influence on feasible respiratory diseases in various atmospheric environments. Previous studies have used clinical and cohort data based on relatively a small number of samples to determine how atmospheric pollutants affect diseases such as asthma. This has significant limitations in that each sample of the collections is likely to produce inconsistent results and it is difficult to attempt the experiments and studies other than by those in the medical profession. This study mainly focuses on predicting the actual asthmatic occurrence while utilizing and analyzing the data on both the atmospheric and meteorological environment officially released by the government. We used one of the advanced analytic models, often referred to as the vector autoregressive model (VAR), which traditionally has an advantage in multivariate time-series analysis to verify that each variable has a significant causal effect on the asthmatic occurrence. Next, the VAR model was applied to a deep learning algorithm to find a prediction model optimized for the prediction of asthmatic occurrence. The average error rate of the hybrid deep neural network (DNN) model was numerically verified to be about 8.17%, indicating better performance than other time-series algorithms. The proposed model can help streamline the national health and medical insurance system and health budget management in South Korea much more effectively. It can also provide efficiency in the deployment and management of the supply and demand of medical personnel in hospitals. In addition, it can contribute to the promotion of national health, enabling advance alerts of the risk of outbreaks by the atmospheric environment for chronic asthma patients. Furthermore, the theoretical methodologies, experimental results and implications of this study will be able to contribute to our current issues of global change and development in that the meteorological and environmental data-driven, deep-learning prediction model proposed hereby would put forward a macroscopic directionality which leads to sustainable public health and sustainability science.


Increase in Air Pollutants and Asthma
Advancements in industry, urbanization, increased human activity due to population growth and the increased consumption of resources have led to the increase in air pollutants and consequent threats to human health. Air quality is being highlighted as a social issue and public interest and concern over respiratory health are increasing. Air pollutants can have chronic effects on the human body and pose a great risk because their effects are expressed in large population groups [1]. For example, London's smog phenomenon in 1952 resulted in a total of 12,000 deaths due to atmospheric congestion and the surge in air pollutant concentrations, which raised public interest in the health hazards of air pollutants [2,3]. Air pollutants can generally cause the generation of pollutants in stages, as shown in Figure 1, due to the feedback action of various pollutants, resulting in complex reactions with the human body, which can lead to a variety of diseases compared to a single substance. Among them, the prevalence of respiratory diseases is assessed to be directly related to the effects of air pollutants [4][5][6][7][8][9]. Asthma, a representative respiratory disease that causes an increase in prevalence and socioeconomic burden worldwide, is thought to be associated with such an increase in air pollutants [10]. In addition to air pollutants, asthma deterioration can also be caused by allergens and occupational exposure, drugs, exercise, etc. Asthma is specifically known to be affected by particulate matter (PM), ozone (O 3 ), nitrogen dioxide (NO 2 ) and sulfur dioxide (SO 2 ) among air pollutants [11][12][13][14].

Increase in Air Pollutants and Asthma
Advancements in industry, urbanization, increased human activity due to population growth and the increased consumption of resources have led to the increase in air pollutants and consequent threats to human health. Air quality is being highlighted as a social issue and public interest and concern over respiratory health are increasing. Air pollutants can have chronic effects on the human body and pose a great risk because their effects are expressed in large population groups [1]. For example, London's smog phenomenon in 1952 resulted in a total of 12,000 deaths due to atmospheric congestion and the surge in air pollutant concentrations, which raised public interest in the health hazards of air pollutants [2,3]. Air pollutants can generally cause the generation of pollutants in stages, as shown in Figure 1, due to the feedback action of various pollutants, resulting in complex reactions with the human body, which can lead to a variety of diseases compared to a single substance. Among them, the prevalence of respiratory diseases is assessed to be directly related to the effects of air pollutants [4][5][6][7][8][9]. Asthma, a representative respiratory disease that causes an increase in prevalence and socioeconomic burden worldwide, is thought to be associated with such an increase in air pollutants [10]. In addition to air pollutants, asthma deterioration can also be caused by allergens and occupational exposure, drugs, exercise, etc. Asthma is specifically known to be affected by particulate matter (PM), ozone (O3), nitrogen dioxide (NO2) and sulfur dioxide (SO2) among air pollutants [11][12][13][14]. The goal of this study is to predict asthmatic occurrence due to air pollution which could have a serious adverse effect on the human body, utilizing open big data released by the government. First, we verified variables that had been identified as significant to actual asthma according to related research. Second, we verified the causality and influence of the variables using the vector autoregressive model (VAR), which is primarily used for multivariate time-series analysis. Next, the constructed VAR model was mixed with a deep learning algorithm, which has emerged notably in recent years with the advent of the big data era, to construct a hybrid DNN optimized for the prediction of asthmatic occurrence. Finally, we verified the performance of the hybrid DNN algorithm and compared it with other time-series algorithms.

Effects of Air Pollutants on Asthma
In general, air pollutants that can have a significant impact on the human body can be classified into gas-like substances, such as NO2, SO2, CO and O3, and particulate matter, such as PM10 and PM2.5. Gas-like substances affect changes in the composition of the atmosphere and are usually byproducts of human economic activities, such as the burning of fossil fuels. NO2, formed by nitrogen oxides, is The goal of this study is to predict asthmatic occurrence due to air pollution which could have a serious adverse effect on the human body, utilizing open big data released by the government. First, we verified variables that had been identified as significant to actual asthma according to related research. Second, we verified the causality and influence of the variables using the vector autoregressive model (VAR), which is primarily used for multivariate time-series analysis. Next, the constructed VAR model was mixed with a deep learning algorithm, which has emerged notably in recent years with the advent of the big data era, to construct a hybrid DNN optimized for the prediction of asthmatic occurrence. Finally, we verified the performance of the hybrid DNN algorithm and compared it with other time-series algorithms.

Effects of Air Pollutants on Asthma
In general, air pollutants that can have a significant impact on the human body can be classified into gas-like substances, such as NO 2 , SO 2 , CO and O 3 , and particulate matter, such as PM 10 and PM 2.5 . Gas-like substances affect changes in the composition of the atmosphere and are usually byproducts of human economic activities, such as the burning of fossil fuels. NO 2 , formed by nitrogen oxides, is considered one of the main sources of material that can cause air pollution. NO 2 generated by burning fossil fuels is mainly generated from automobile exhaust, which tends to be present in high concentrations in urbanized and industrialized areas [15,16]. An increase in exposure to NO 2 can increase respiratory tract hypersensitivity and it has become clear that it is a substance directly related to asthma prevalence, reducing the lung function of asthmatic patients [17][18][19][20]. SO 2 is a substance caused mainly by the oxidation of sulfur contained in crude oil when refining or burning oil. Previous studies have shown that exposure to large amounts of SO 2 can cause airway contraction and increase the prevalence of asthma through interaction with other air pollutants such as NO 2 [21][22][23]. CO is a colorless and odorless gas generated by incomplete combustion of hydrocarbons. Automobile exhausts and combustion devices such as boilers and heaters are the main generators of CO. CO binds with hemoglobin in the blood inside the lungs and forms carboxy-hemoglobin (CoHb) which reduces oxygen-carrying capacity. Therefore, it can interfere with respiratory metabolism and have harmful effects on health [24][25][26][27]. O 3 is produced by the photochemical oxidation of the sun's rays between nitrogen oxides and hydrocarbons from automobile exhaust. O 3 has the characteristic of increasing in concentration as temperature increases, which means its influence is stronger in summer [28]. High O 3 concentration in the atmosphere can induce a decrease in lung function and an increase in airway hyper-sensitivity [29,30], and exposure in a high temperature environment in a short period of time, in particular, can induce worsening symptoms in asthmatic patients [31]. The problem of O 3 outbreaks is expected to intensify in today's society, where many abnormal temperatures occur, sometimes due to global warming. Lastly, particulate matter (PM), the worst by-product of industrialization which has been designated as a first-class carcinogen by the World Health Organization (WHO), is a substance composed by a mixture of solid and liquid particles. Particulate matter is called PM 10 if the diameter of particle is less than 10µm and PM 2.5 if the diameter is less than 2.5 µm. Particulate matter is mainly produced through combustion in industrial processes and chemical reactions with the primary pollutants generated by automobile exhaust. PM 10 tends to be deposited in the upper airway or bronchus, and PM 2.5 can have adverse effects on respiratory diseases such as in the small airway or alveoli, depending on the relative size of particles [32,33]. According to previous studies, exposure to high concentrations of particulate matter in a short period of time can worsen symptoms of asthma patients [34,35], and differences in the influence of particulate matter among age groups have been identified. Robert et al. analyzed the effect of PM 2.5 on asthma for four groups (under 6, 6 to 18, 19 to 49, and over 50 years old); they confirmed that children and youth groups aged 6-18 are at the highest risk from particulate matter [30]. Ko et al. confirmed that the influence of particulate matter on asthma varies by age group, with the greatest impact between particulate matter and asthma in age groups under 14 years and the highest impact on acute deterioration in age groups 65 or older [36]. Overall, the influence of particulate matter on asthma is higher for the elderly and children than for adults.

Effects of Meteorological Changes on Asthma
Changes in temperature, humidity and air pressure can change the distribution of air pollutants and affect the concentration of allergens, such as pollen and mold spores in the atmosphere, leading to worsening symptoms in asthma patients. A study by Sutyajeet et al. confirmed the significance of asthma outbreaks for high temperatures and precipitation [37], and Renato et al. confirmed that climate change in certain areas can change the amount of pollen in the atmosphere, affecting allergic diseases such as asthma [38]. Antonio et al. analyzed the influence of climate on asthma and verified that a decrease in maximum air pressure and increase in humidity could have a significant impact on asthma [39]. Overall, meteorological change including climate change is a factor that interacts with the distribution of air pollutants and can have complex effects on asthma.

Prediction of the Asthmatic Occurrence
Asthma caused by the combined interaction of air quality and the meteorological environment, as described above, has increased in prevalence worldwide. This has caused a variety of socioeconomic problems. Katayoun et al. noted that the onset of asthma causes a burden on individuals and a decrease in productivity at the national level [40]. Patrick et al. proved that the frequency of hospitalization of asthmatic patients is 6.4 times higher than that of ordinary people, and emergency room visits are up to 1.8 times higher, which results in higher medical expenses and reduced productivity, which in turn increases the risk of unemployment [41]. Despite the increasing prevalence of asthma, which results in various social costs for the nation and individuals, most national disease management systems do not prioritize and manage asthma [42,43]. Therefore, most asthmatic patients visit emergency rooms or hospitals only if they have symptoms, due to cost problems and the absence of a national monitoring system. At a time when urbanization and industrialization are accelerating, the limitation is clear: most previous studies have focused on environmental policy suggestions about clinical case-based asthma, including factors which are not highly feasible at the moment.
We determined that greater efficiency can be achieved in the development of health policies and budget distribution at the national level and the training and deployment efficiency of emergency medical personnel in hospitals with the open innovation perspective if a clear number of patients could be predicted for the actual atmospheric environment [44,45]. The preceding studies confirmed that the impact of government policies can be interpreted through the VAR model used for multivariate time series data analysis [46,47], and in other studies there exist the cases which applied DNN models to predict the effectiveness of public health polices [48]. Based on these results, unlike previous studies that utilized clinical cases and cohort data, the focus of this study is on building a model that predicts the number of asthma patients in the future after deriving key factors that could cause asthma, based on long-term time series data in the atmospheric environment.

Datasets
This study utilized a total of 10 endogenous variables, including a total of 6 atmospheric data, 3 meteorological data and asthmatic occurrence data from Seoul, South Korea; the data from 2015 to 2017 were constructed as the train set and the data from 2018 were used as the test set to verify the performance of the prediction model.

Asthmatic Occurrence
The data, which were used as a prediction target in this study, included the number of asthma cases. They were extracted from the public portal [49] which provides open data based on the past cases managed by the Korean National Health Insurance Service. We used only the number of cases that occurred in Seoul, South Korea, from 2 January 2015 to 31 December 2018. In the case of weekends and holidays, the analysis was excluded to ensure the reliability of our analysis, as it is highly likely to be underestimated compared to the actual number of asthma patients due to hospitals being closed, etc. Information on the final data we utilized is shown in Table 1 and Figure 2.

Atmospheric Environments
The atmospheric data used in this study were extracted from Air Korea web sites [50], which is managed by the Korean Ministry of Environment. The data were daily-based and averaged across all monitoring stations located in Seoul, South Korea, from 2 January 2015 to 31 December 2018. Air Korea accumulates concentrations of air pollutants collected through 398 measurement networks in 112 cities and counties nationwide into the National Ambient Air Quality Monitoring Information System (NAMIS), and discloses most of the data to the public, so the reliability of the data for analysis is regarded as officially guaranteed. We excluded the data measured over weekends and holidays to enhance consistency with asthmatic occurrence data, and the collected data include SO2, CO, O3, NO2, PM10 and PM2.5. The data we finally used are shown in Table 2 and Figure 3.

Atmospheric Environments
The atmospheric data used in this study were extracted from Air Korea web sites [50], which is managed by the Korean Ministry of Environment. The data were daily-based and averaged across all monitoring stations located in Seoul, South Korea, from 2 January 2015 to 31 December 2018. Air Korea accumulates concentrations of air pollutants collected through 398 measurement networks in 112 cities and counties nationwide into the National Ambient Air Quality Monitoring Information System (NAMIS), and discloses most of the data to the public, so the reliability of the data for analysis is regarded as officially guaranteed. We excluded the data measured over weekends and holidays to enhance consistency with asthmatic occurrence data, and the collected data include SO 2 , CO, O 3 , NO 2 , PM 10 and PM 2.5 . The data we finally used are shown in Table 2 and Figure 3.
As shown in Figure 3, there appears to be a major spike of PM 10 in February 2015. This results from the combined effect of both yellow dust and air pressure in southern Mongolia and northern China at that time, and the previous study confirmed this to be suitable for actual observed data [51].  As shown in Figure 3, there appears to be a major spike of PM10 in February 2015. This results from the combined effect of both yellow dust and air pressure in southern Mongolia and northern China at that time, and the previous study confirmed this to be suitable for actual observed data [51].

Meteorological Environments
The meteorological data used in this study were extracted from the Korea Meteorological Agency (KMA) web sites [52] based on the daily average at a monitoring station located in Seoul, South Korea, from 2 January 2015 to 31 December 2018. The KMA provides meteorological environment data obtained through the Automatic Synoptic Observation System (ASOS) as a public service, and the types and time differences of collectable data are also diverse, which is highly useful for analysis. Data from weekends and holidays were excluded to enhance consistency with the

Meteorological Environments
The meteorological data used in this study were extracted from the Korea Meteorological Agency (KMA) web sites [52] based on the daily average at a monitoring station located in Seoul, South Korea, from 2 January 2015 to 31 December 2018. The KMA provides meteorological environment data obtained through the Automatic Synoptic Observation System (ASOS) as a public service, and the types and time differences of collectable data are also diverse, which is highly useful for analysis. Data from weekends and holidays were excluded to enhance consistency with the asthmatic occurrence data, and the finally utilized data are shown in Table 3 and Figure 4.

Methodology
The vector autoregressive model (VAR), proposed by Sims [53], is represented in the form of a dynamic simultaneous equation in which the values for the past order of N variables with causality are used as endogenous variables to influence each other [54]. VAR (p) consists of an autoregressive

Vector Autoregressive Model
The vector autoregressive model (VAR), proposed by Sims [53], is represented in the form of a dynamic simultaneous equation in which the values for the past order of N variables with causality are used as endogenous variables to influence each other [54]. VAR (p) consists of an autoregressive process in which X t = (X 1t , X 2t , X 3t , ···, X Nt ) composed of N multivariate stationary time series is p time lags. VAR (p) formula is as follows: where C means (N × 1) constant vector, θ i means (N × N) matrix of the time difference regression coefficient between the current variable and the time difference variable, and εt means the white noise of (N × 1). In other words, X 1 , t is explained as its own past value and the past value of the other endogenous variable X t , and the remainder that cannot be explained by the variable is explained by the white noise εt. Therefore, the multivariate stationary time series are compositely influenced by historical values from each other to interpret the current values [53]. The VAR model is applicable to stationary time series data. When non-stationary time series are used in the VAR model, the mean and covariance of the time series change over time and the exact model cannot be estimated. Therefore, before estimating the VAR model, it is necessary to conduct a unit root test to determine the stability of the time series. In this study, we applied the widely known augmented Dickey-Fuller (ADF) test to determine the stationary characteristics of each variable [55], and non-stationary time series data were converted to stationary time series through differences and then used to estimate the model.
The ADF formula is as follows: In general, the VAR model aims to reflect the composite influence between endogenous variables in the model. Therefore, the composition of the variables can be said to be significant in model estimation, among which the order of the variables is a meaningful symbolic factor in that the results in the impulse response function can be differently derived [56,57]. In this study, the Granger causality test, a type of VAR model, was used to construct the order of variables based on the precedence of variables.
The Granger causality formula is as follows: When the Granger causality test estimates the values of the stationary time series X t , if it increases the significance of the model by using the value of the time lag p for Y t of Equation (6) in addition to the value of the time lag p for X t of Equation (5), it is in general defined that Y t Granger cause X t [56][57][58][59]. Thus, based on the results of Granger causality by endogenous variables, the order of the variables in VAR model can be determined by taking into account the order of the impact of each variable to the prediction target.
For the VAR model, unlike the utilization of the partial autocorrelation function (PACF) in the AR model, while we make use of the covariance matrix p for the residuals of VAR model, the value p with minimal statistics for the following equations can be used as the time lag of the model. In this study, the equation for minimizing p is used as a model time lag determination criterion, considering the limitations of VAR models whose predictive performance decreases as the number of the variables to be estimated increases [60].

Hybrid Deep-Learning Model Based VAR & DNN
Deep neural networks (DNN) are one of the cutting-edge algorithms in this era of digital transformation. The usage and popularity of DNN have been rapidly increasing in recent years due to improvements in computing power and the ease of securing big data, and DNN has been regarded as a core player in future industries [61,62]. Unlike common linear models such as the linear regression (LR) model, DNN is evaluated as a model that takes into account nonlinearity, similar to real-world problems. DNN consists of an input layer to receive data input, multiple hidden layers and nodes, and an output layer to produce the final result ( Figure 5). The nodes in each hidden layer are linked step by step to the output layer, and each node is filled with an intermediate calculation value from the input value to the output value. In this process, weight is assigned for each link, and each weighted sum (WS) is calculated for the associated node that it is applied to. The above process is for performing back-propagation to find the update value for the weight between each layer through gradient descent based on the error measured in the final output layer, and via a large number of iterations, ultimately to optimize the weight. are linked step by step to the output layer, and each node is filled with an intermediate calculation value from the input value to the output value. In this process, weight is assigned for each link, and each weighted sum (WS) is calculated for the associated node that it is applied to. The above process is for performing back-propagation to find the update value for the weight between each layer through gradient descent based on the error measured in the final output layer, and via a large number of iterations, ultimately to optimize the weight. Since the weighted sum (WS) of each node is determined by sequential influence based on the utilized in the input layer for learning in DNN, the selection of variables to be used for learning has a strong influence on model performance [63].
The formula for the weighted sum is as follows: At a glance, the above formula seems similar to linear regression. Therefore, in this study, if the variable configuration of the estimated VAR model is structured by the DNN, the formula of the weighted sum of Equation (10) will be changed to the same form as the Equation (11) of the VAR, which will have a significant effect on improving the DNN prediction performance in a multivariate time series that includes the autoregressive (AR). In other words, the DNN model's predictive performance can be enhanced by utilizing input variables with lag (p) estimated through the VAR model.
The weighted sum of the hybrid DNN formula is as follows: As a result, we propose to establish a hybrid deep learning model, with an open innovation perspective, comprehensively considering linearity and nonlinearity in multivariate time series using VAR and DNN in serial connection [64]. The model structure of the hybrid DNN model is shown in Figure 6. Since the weighted sum (WS) of each node is determined by sequential influence based on the x i utilized in the input layer for learning in DNN, the selection of variables to be used for learning has a strong influence on model performance [63].
The formula for the weighted sum is as follows: At a glance, the above formula seems similar to linear regression. Therefore, in this study, if the variable configuration of the estimated VAR model is structured by the DNN, the formula of the weighted sum of Equation (10) will be changed to the same form as the Equation (11) of the VAR, which will have a significant effect on improving the DNN prediction performance in a multivariate time series that includes the autoregressive (AR). In other words, the DNN model's predictive performance can be enhanced by utilizing input variables with lag (p) estimated through the VAR model.
The weighted sum of the hybrid DNN formula is as follows: As a result, we propose to establish a hybrid deep learning model, with an open innovation perspective, comprehensively considering linearity and nonlinearity in multivariate time series using VAR and DNN in serial connection [64]. The model structure of the hybrid DNN model is shown in Figure 6.

Unit Root Test
It is necessary to ensure that each endogenous variable consists of a stationary time series to establish assumptions for the correct estimation of the VAR model. Thus, in this study, the unit root test of the data was performed through the ADF test, and the first difference was performed to ensure that the time series data were stationary if the null hypothesis was not rejected at the 5% significance level. The results are shown in Table 4.

Unit Root Test
It is necessary to ensure that each endogenous variable consists of a stationary time series to establish assumptions for the correct estimation of the VAR model. Thus, in this study, the unit root test of the data was performed through the ADF test, and the first difference was performed to ensure that the time series data were stationary if the null hypothesis was not rejected at the 5% significance level. The results are shown in Table 4. As a result of the ADF test, it was confirmed that O 3 (atmospheric parameter), temperature and air pressure (meteorological parameter) were non-stationary time series at a 5% significance level. Those variables can also be observed as being non-stationary in Figures 3 and 4, and to mitigate the non-stationary characteristics and to convert them into stationary time series, we utilized the first difference under the ADF test. From the results of the first difference, as shown in Table 4, we confirmed that all variables are stationary data. Finally, we were able to use stationary time series data to construct the model.

Granger Causality
The estimation of the VAR model can be aimed at reflecting the composite influence between variables in the model, so it is necessary to ensure that each variable has a sequential influence on the asthmatic occurrence and determine whether it is a variable that can have causality in the actual asthmatic occurrence. In this study, through the aforementioned Granger causality test, we tried to derive the logical order of variables to be used in the model. Table 5 shows the results of the Granger causality test for endogenous variables in this model.  The results of the Granger causality test indicate that Granger causality exists in the asthmatic occurrence for all variables except O 3 . In the case of O 3 , there is no direct Granger causality for asthmatic occurrence, but it was confirmed that Granger causality exists for endogenous variables such as SO 2 and CO that affect asthmatic occurrence. Therefore, it is appropriate to include O 3 as an endogenous variable in the model, as it is judged to affect asthmatic occurrence through sequential interaction with other variables. Based on the results of the test, we identified a number of variables that constitute feedback or bilateral causality, which may be interpreted as indicating the possibility that other exogenous variables were involved in the causality between the two variables. This means that it is necessary to utilize some additional exogenous variables in addition to the constructed variables. Finally, the order of variable composition of the VAR model estimated based on the causal results and degree of erogeneity of the Granger causality test was AiPr, O 3 , Temp, PM 2.5 , CO, PM 10 , Hum, NO 2 , SO 2 and AsO. In addition, the dummy variable (Hol) immediately before and after the expected date of a holiday or weekend and seasonal dummy variables (Su, Au, Wi) were added to take into account the seasonality and yearly trend of the endogenous variables to be used in estimating the VAR model [65,66].

Estimation of VAR Model
The results of AIC, BIC and HQIC were used to select the optimal time-lag p of the VAR model by utilizing 10 endogenous variables and 4 dummy variables, as shown in Table 6. There were differences in the results: BIC derived 1 from the model's optimal time-lag, but AIC judged the model's optimal time-lag as 6 and HQIC as 3. Because including too many variables in the model could cause problems in estimating the model due to the reduction in the degree of freedom, this study estimated VAR (1) for asthmatic occurrence based on the BIC.
The formula of VAR (1) for asthmatic occurrence is as follows: Table 7 shows the estimated VAR (1) results for the asthmatic occurrence to be used in this study among the estimated VAR (1) models. As a result of the p-value in VAR (1), the occurrence of asthma can be interpreted as being significantly affected by the asthmatic occurrence of the day before, holiday, temperature, NO 2 and SO 2 . In addition, throughout the results of the variance inflation factor (VIF), which are typically used to check for the multi-collinearity of the variables, we confirmed that all variables in the model do not have multi-collinearity because their values are below the value of 10 [67][68][69]. Figure 7 is the result of the impulse response of endogenous variables to asthmatic occurrence. First, when 1hPa impulse was applied to the air pressure, the asthmatic occurrence showed an increase of about 9.25 after 1 day, and it was confirmed that the influence converges to zero after 5 days. O 3 causes an increase of about 2575.59 a day after the impulse of 1ppm and 3423.64 two days later, with its influence rapidly decreasing thereafter. Temperature can cause a decrease in asthmatic occurrence of about 54.15 when an impulse of 1 • C is applied, and its influence has been confirmed to converge at zero after about 7 days. This result is slightly different from the findings of the preceding study [37], and as shown in Figures 2 and 4, the actual asthmatic occurrence can be interpreted as having relatively little influence on the frequency of visits due to the concentration of allergens at high temperatures. PM 2.5 shows that an impulse of 1 µg/m 3 can cause a decrease of 54.15 in asthmatic occurrence after one day, and it has been confirmed that the influence converges at zero after about seven days, which does not appear to be highly credible when checking the p-value and confidence interval of Figure 7 in the model. It was confirmed that CO could cause an increase of about 292.69 in asthmatic occurrence a day later when 1 ppm of impulse was applied, and its influence could last for a relatively long period. PM 10 causes a decrease of approximately 0.99 in asthmatic occurrence after one day when an impulse of 1 µg/m 3 is applied, and its influence immediately converges to zero. In addition to PM 2.5 , the confidence interval and p-value are considered unreliable. Humidity increases the asthmatic occurrence by about 3.47 after one day if an impulse of 1% is applied, and the influence converges to zero after about three days, reaffirming that the influence of the allergen concentration on asthmatic occurrence is not significant, as in the case of temperature. It was confirmed that the impulse of NO 2 1 ppm could increase the asthmatic occurrence by about 12,622.22, and that its influence could continue after two weeks. Finally, in the case of SO 2 , it was confirmed that when an impulse of 1 ppm was applied, the asthmatic occurrence could increase by about 95,442.85 after one day, and its influence could continue for a rather long period of time, just as in the case of SO 2 . To sum up the results, some variables did not have high significance in p-value, resulting in a mixture of positive (+) and negative (−) in the confidence interval for the impulse response. It is believed that logical interpretation will be possible based on empirical judgment and previous research. In addition, the fact that these interpretations of the meteorological environment are somewhat different from previous studies seems to reflect the discriminatory characteristics of this To sum up the results, some variables did not have high significance in p-value, resulting in a mixture of positive (+) and negative (−) in the confidence interval for the impulse response. It is believed that logical interpretation will be possible based on empirical judgment and previous research. In addition, the fact that these interpretations of the meteorological environment are somewhat different from previous studies seems to reflect the discriminatory characteristics of this study using large, generalized data, unlike small clinical data, and this is considered to be noteworthy. Given that the effects of the impulse response are longer along the order of the variables, it is confirmed that the correct variable configuration was successful, taking advantage of the results of Granger causality.
Finally, Figure 8 shows the result of verifying the predicted performance of the finally estimated VAR (1) for asthmatic occurrence using the test set as the goal of this study. To sum up the results, some variables did not have high significance in p-value, resulting in a mixture of positive (+) and negative (−) in the confidence interval for the impulse response. It is believed that logical interpretation will be possible based on empirical judgment and previous research. In addition, the fact that these interpretations of the meteorological environment are somewhat different from previous studies seems to reflect the discriminatory characteristics of this study using large, generalized data, unlike small clinical data, and this is considered to be noteworthy. Given that the effects of the impulse response are longer along the order of the variables, it is confirmed that the correct variable configuration was successful, taking advantage of the results of Granger causality.
Finally, Figure 8 shows the result of verifying the predicted performance of the finally estimated VAR (1) for asthmatic occurrence using the test set as the goal of this study. The estimated model seems to reflect the flow and pattern of increases and decreases in general, but due to the limitations of the OLS-based linear model, it was confirmed that some predictions were limited, such as the surge in patients in abnormal situations that reflected nonlinear patterns.

Extract Features from VAR Model
The type of variable used in the estimation of previous VAR (1) is composed of y t-1 and x t-1 for the predicted value y t (X AsO, t ) and the dummy variables as in Equation (12). In the hybrid DNN model utilized in this study, the above variables were used as the input variable in the same form, and the ultimate form of the model is shown in Figure 9.
The designed model was made to reflect the complicated pattern of variables by including 128 nodes for each of the three hidden layers and set the hyper-parameter to have a 0.001 learning rate and 300 epochs. At this time, in order to prevent the over-fitting of the model for the train set, drop out was specified to exclude some nodes for each hidden layer during learning, and judging that the raw data's time series characteristics were reflected in a single line, the learning was conducted through random sampling, which excludes consideration of the time series by line. In addition, the model's optimization was performed using the Adam optimizer. Mean absolute error (MAE) was used for the loss function.

Extract Features from VAR Model
The type of variable used in the estimation of previous VAR (1) is composed of yt-1 and xt-1 for the predicted value yt (XAsO, t) and the dummy variables as in Equation (12). In the hybrid DNN model utilized in this study, the above variables were used as the input variable in the same form, and the ultimate form of the model is shown in Figure 9. The designed model was made to reflect the complicated pattern of variables by including 128 nodes for each of the three hidden layers and set the hyper-parameter to have a 0.001 learning rate and 300 epochs. At this time, in order to prevent the over-fitting of the model for the train set, drop out was specified to exclude some nodes for each hidden layer during learning, and judging that the raw data's time series characteristics were reflected in a single line, the learning was conducted through random sampling, which excludes consideration of the time series by line. In addition, the model's optimization was performed using the Adam optimizer. Mean absolute error (MAE) was used for the loss function.

Performance Evaluation of Hybrid DNN
The results of verification using the test set for the proposed hybrid DNN model are shown in Figures 10 and 11.

Performance Evaluation of Hybrid DNN
The results of verification using the test set for the proposed hybrid DNN model are shown in Figures 10 and 11.    The model confirmed that the loss graph was reduced due to smooth learning, and the predictive performance of the random sampling test set was also significant. The mean absolute error (MAE) for the model's asthmatic occurrence is approximately 479, indicating an error of about 8.17% compared to the daily average of about 5860.
To validate the performance of this hybrid DNN, its predictive performance was compared with those of other algorithms by utilizing the same data. Table 8 shows the results of performance measurements and comparisons of general DNN, VAR models and long short-term memory (LSTM), which are known to have high predictive performance for existing time series data.  The model confirmed that the loss graph was reduced due to smooth learning, and the predictive performance of the random sampling test set was also significant. The mean absolute error (MAE) for the model's asthmatic occurrence is approximately 479, indicating an error of about 8.17% compared to the daily average of about 5860.
To validate the performance of this hybrid DNN, its predictive performance was compared with those of other algorithms by utilizing the same data. Table 8 shows the results of performance measurements and comparisons of general DNN, VAR models and long short-term memory (LSTM), which are known to have high predictive performance for existing time series data. The results for each performance evaluation confirm that hybrid DNN demonstrated the best performance, deduced to be the result of the combination of linearity of VAR and nonlinearity of DNN.

Conclusions
Air quality and favorable surrounding environment in daily life are so crucial for human beings' sustainable life. Nevertheless, the problems of air pollution continue to intensify, due to increased human activity directly relevant to the advancement of industry. Although respiratory diseases such as asthma are high-risk candidates that can be immediately affected by these problems, most countries do not place priority on the disease management of asthma. This study proposes a prediction model to predict asthmatic occurrence by utilizing the deep neural network algorithm, which enhanced usability in model analysis for existing problem resolving, utilizing the advancement of computing power and the potentials of big-data collection. The proposed hybrid model of VAR and DNN, referred to as the hybrid DNN model, has the advantage of being able to simultaneously reflect both linear and nonlinear patterns for data in the stage of learning the model. In addition, the proposed model is also meaningful in that VAR can help overcome some of the DNN's limitations which made it impossible to interpret that the effects of variables in the existing model. The influence on asthmatic occurrence was confirmed by the impulse response of VAR, confirming that seasonality, temperature and SO 2 influences in the model were the greatest. Variables that were considered to have a significant impact on asthma, such as particulate matter, based on previous clinical case-based studies, were found not to have significance for asthmatic occurrence. In addition, the finding of this study that lower temperature, as in winter, could adversely affect asthma is incompatible with previous studies. This is the evidence that the pattern in the data, showing that the number of asthma patients increases in winter, as shown in Figure 2, has been appropriately reflected in the model. Unlike previous studies, this approach is meaningful in that it can identify asthmatic occurrence patterns in relation to the atmospheric and meteorological environment and also identify the general impact of each variable on disease using the result of impulse response in VAR. In fact, atmospheric and meteorological factors in the real world have interrelations and can be viewed as an interaction variable that functions like an integrated single variable. Therefore, we utilized the VAR model and ADF test that could identify the optimal time lag, which is determined from either the minimum value of Akaike information criterion (AIC) or Schwartz information criterion (SIC), where each variable has a concurrent effect on asthmatic occurrence in order to enhance the possibility of interrelations among variables in the real world and the predicted performance of the hybrid model. Finally, the performance of the proposed model has been compared with other time series forecasting models, confirming that the mean absolute error (MAE) demonstrated the best performance at about 8.17%, underscoring that this model was highly applicable.
Unlike preceding studies conducted in limited samples such as clinical and cohort data for conventional asthma patients, this study presented a new directionality for atmospheric and meteorological data-driven disease research in that constantly updated, large-scale open data for the prediction modeling and analysis are credibly collected at the national level. In addition, it could also be also extensively applied by appropriately changing the parameters of the model suitable for any specific countries in which the data could be obtained. The features of this hybrid deep learning algorithm can provide the possibility of sustainable research expansion from the perspective of creative open innovation [70].
In summary, it is believed that predictions of future asthmatic occurrence through our proposed hybrid deep-learning model with open innovation perspective will improve efficiency in the management and resource allocation of national health insurance and budgets, and also guarantee efficiency in the deployment of medical personnel in hospitals [70]. In addition, through the selection of criteria for the number of predicted patients and the development of asthma risk indexes, early notification systems based on the atmospheric environment for chronic patients will contribute to the increase in national health and socioeconomic productivity. It is also expected that the proposed model will be highly applicable to all engaged in disease analysis in general, as it can be used to predict the number of patients with other diseases, and the prediction range can be expanded to all disease groups such that the data-and-model-based alarming could provide a good solution toward coping with current sustainability and open innovation perspective [71].