Exploring the Relevance of Green Space and Epidemic Diseases Based on Panel Data in China from 2007 to 2016

Urban green space has been proven effective in improving public health in the contemporary background of planetary urbanization. There is a growing body of literature investigating the relationship between non-communicable diseases (NCDs) and green space, whereas seldom has the correlation been explored between green space and epidemics, such as dysentery, tuberculosis, and malaria, which still threaten the worldwide situation of public health. Meanwhile, most studies explored healthy issues with the general green space, public green space, and green space coverage, respectively, among which the different relevance has been rarely explored. This study aimed to examine and compare the relevance between these three kinds of green space and incidences of the three types of epidemic diseases based on the Panel Data Model (PDM) with the time series data of 31 Chinese provinces from 2007 to 2016. The results indicated that there exists different, or even opposite, relevance between various kinds of green space and epidemic diseases, which might be associated with the process of urban sprawl in rapid urbanization in China. This paper provides a reference for re-thinking the indices of green space in building healthier and greener cities.


Introduction
Humans are facing health challenges due to congested spaces and polluted environments within the contemporary process of planetary urbanization [1]. Urban green space has thus become an effective tool for planning healthy cities, offering not only critical ecosystem services but also significant physical and mental health benefits [2].
Due to economic growth, improved health care services, and the decline in rural population, the incidences of epidemics, such as dysentery, tuberculous, and malaria, have been observed to be decreasing worldwide [19][20][21]. However, the heterogeneity in geography and socioeconomic conditions still leads to hot spots with high disease incidences [22][23][24][25][26]. The persistence and re-emergence of epidemics in recent decades have been associated with the process of global urbanization, which has been reshaping the environmental, economic, and demographic system [27]. Moreover, current climate conditions still leads to hot spots with high disease incidences [22][23][24][25][26]. The persistence and reemergence of epidemics in recent decades have been associated with the process of global urbanization, which has been reshaping the environmental, economic, and demographic system [27]. Moreover, current climate change crises contemporarily affecting temperature, precipitation, flood, and extreme weather events are exacerbating the outbreak and spread of infectious diseases [28].
The interrelationship between climate change, urbanization, epidemics, and green space remains complex (Figure 1). Dysentery has been proved related to population density, high temperature, and sanitary conditions [29], while urbanization has also brought new challenges in controlling tuberculosis aroused by poverty and immigration [30][31][32][33], as well as human mobility [34,35]. Increasing impervious area during urban expansion may shrink the habitat of malaria vectors [36], while the immigrants or new residents with low socioeconomic status may face significant challenges of malaria for entering the territory of vectors in new cities located near farmlands, forests, and rivers [37,38]. Few studies have examined the correlation between green space and these three kinds of epidemics that have been proven relevant to vegetation in the urbanization background.
Environment factors play an important role in dysentery infections, such as water pollution [39], distance to farmland [40], flood [41,42], temperature, and humidity [43][44][45]. In addition, people in urban environments are more vulnerable to temperature increases than rural populations [46]. Tuberculosis is caused by airborne particles from person to person and is affected by population density, which may share the same relevance of green space with dysentery [47]. Such factors pose a great challenge for public facilities with high population density [48,49], which may include public parks. Beyond that, the other two environmental factors, polluted water and cold weather, are also associated with tuberculosis [50,51].
The role of green space in malaria remains blurred compared to the possible benefits of dysentery and tuberculosis. Malaria is caused by infective mosquitoes which habitat mainly located in green spaces [52], while geographical environments provides the habitat for both vectors and humans, affecting the incidence and seasonality of malaria with different probabilities of infection and transmission [53]. Urban green space may be reserved as a livable environment for mosquitoes, such as ponds and shrubs [54,55]. Some studies report that the vegetation is not significantly associated with malaria [56,57], while others verify the positive effect of green space on mitigating temperature, which plays a critical role in malaria outbreak [58][59][60][61][62]. Few studies have examined the correlation between green space and these three kinds of epidemics that have been proven relevant to vegetation in the urbanization background.
Environment factors play an important role in dysentery infections, such as water pollution [39], distance to farmland [40], flood [41,42], temperature, and humidity [43][44][45]. In addition, people in urban environments are more vulnerable to temperature increases than rural populations [46]. Tuberculosis is caused by airborne particles from person to person and is affected by population density, which may share the same relevance of green space with dysentery [47]. Such factors pose a great challenge for public facilities with high population density [48,49], which may include public parks. Beyond that, the other two environmental factors, polluted water and cold weather, are also associated with tuberculosis [50,51].
The role of green space in malaria remains blurred compared to the possible benefits of dysentery and tuberculosis. Malaria is caused by infective mosquitoes which habitat mainly located in green spaces [52], while geographical environments provides the habitat for both vectors and humans, affecting the incidence and seasonality of malaria with different probabilities of infection and transmission [53]. Urban green space may be reserved as a livable environment for mosquitoes, such as ponds and shrubs [54,55]. Some studies report that the vegetation is not significantly associated with malaria [56,57], while others verify the positive effect of green space on mitigating temperature, which plays a critical role in malaria outbreak [58][59][60][61][62].
All the environmental factors influencing these three kinds of epidemics may be directly or indirectly affected by green space, as more and more studies have shown that green space can alleviate air and water pollution [63][64][65], moderate heat island effect [66], and control flood and waterlogging [67].
All the environmental factors influencing these three kinds of epidemics may be directly or indirectly affected by green space, as more and more studies have shown that green space can alleviate air and water pollution [63][64][65], moderate heat island effect [66], and control flood and waterlogging [67]. Much uncertainty still exists about the relationship between green space and epidemics since green space has been an important policy in building a healthy city ( Figure 2). Furthermore, it is necessary to understand the different mechanisms that various kinds of green spaces play in public health, as green space has been considered a combination of urban green space, agricultural space, and natural green space [68]. Many studies insist that the distance to the general green space plays a crucial role in public health [69]. More research has shown that it is parks that impose a more positive effect on the health condition of nearby residents [70][71][72][73], for it can largely increase physical activities [74,75]. Neighborhood green space, as a component of green space coverage, has also been verified to have positive effects on the surrounding population [76]. Different kinds of green spaces vary not only in terms of scale, function, and accessibility but also in terms of vegetation cover, ecological dimension, and environmental quality [77]. In the annual official reports of China Statistical Yearbook (http://www.stats.gov.cn), the Chinese government has been using three similar statistic indicators: Area of Green Space, Area of Parks, and Green Coverage ratio of built-up area. However, seldom have the different influences on public health imposed by these three kinds of green space been investigated and compared, and comparison would help to understand the role of green space in epidemic diseases and public health.
This paper aims to provide a spatiotemporal overview of green space and epidemic diseases in China, in order to explore the correlation between them and to compare the different effects of these three kinds of green space on the macro scale with provincial statistical data of China in the past decade. The quantitative method adopted is the Panel Data Model (PDM), which provides three models to examine the longitudinal data, superior to traditional statistical analysis [78].

Variables and Data Source
(1) Epidemic disease incidence. Three typical diseases were chosen as the dependent variables: Dysentery, Tuberculosis, and Malaria. The data was extracted from China Health Statistics Yearbook, 2007-2016 (http://www.nhc.gov.cn/). The unit of incidence is based on 1/100,000 people. A natural log-transformation was applied to the dependent variables to avoid the heteroscedasticity. Furthermore, it is necessary to understand the different mechanisms that various kinds of green spaces play in public health, as green space has been considered a combination of urban green space, agricultural space, and natural green space [68]. Many studies insist that the distance to the general green space plays a crucial role in public health [69]. More research has shown that it is parks that impose a more positive effect on the health condition of nearby residents [70][71][72][73], for it can largely increase physical activities [74,75]. Neighborhood green space, as a component of green space coverage, has also been verified to have positive effects on the surrounding population [76]. Different kinds of green spaces vary not only in terms of scale, function, and accessibility but also in terms of vegetation cover, ecological dimension, and environmental quality [77]. In the annual official reports of China Statistical Yearbook (http://www.stats.gov.cn), the Chinese government has been using three similar statistic indicators: Area of Green Space, Area of Parks, and Green Coverage ratio of built-up area. However, seldom have the different influences on public health imposed by these three kinds of green space been investigated and compared, and comparison would help to understand the role of green space in epidemic diseases and public health.
This paper aims to provide a spatiotemporal overview of green space and epidemic diseases in China, in order to explore the correlation between them and to compare the different effects of these three kinds of green space on the macro scale with provincial statistical data of China in the past decade. The quantitative method adopted is the Panel Data Model (PDM), which provides three models to examine the longitudinal data, superior to traditional statistical analysis [78].

Variables and Data Source
(1) Epidemic disease incidence. Three typical diseases were chosen as the dependent variables: Dysentery, Tuberculosis, and Malaria. The data was extracted from China Health Statistics Yearbook, 2007-2016 (http://www.nhc.gov.cn/). The unit of incidence is based on 1/100,000 people. A natural log-transformation was applied to the dependent variables to avoid the heteroscedasticity.  Considering that economic factors and medical services may also affect the public health [79,80], several socioeconomic indicators were also selected as ancillary variables, such as total population (Pop), urban population (Urban Pop), population density (Pop density), built-up area (Built-up), gross domestic product (GDP), and medical workers (Medi_workers). Similarly, the natural logtransformation was applied to all the above variables.
(3) Temperature and Humidity. Since temperature and humidity could affect the incidence of epidemics, the PDM also added variables of temperature, such as annual average temperature (Average T), maximum temperature (High T), lowest temperature (Low T), and relative humidity (Humidity). The data is available from the National Meteorological Information Center (http://data.cma.cn/).
The panel data consists of indicators of 31 provinces, autonomous regions, and municipalities in ten-year, so we have 310 observations (31 × 10 = 310).

Methods
(1) Moran's I To explore the spatial distribution of green space and epidemic diseases, this article used Moran's I as a measure of spatial autocorrelation. Moran's I is defined as: where N is the number of spatial units indexed by i and j, x is an observed variable, is the mean of x, wij is a matrix of spatial weights, and W is the sum of all wij. The value of Moran's I is between −1 to 1, showing whether the spatial distribution of such variable is dispersed, clustered, or random.
(2) PDM PDM is a popular quantitative method for longitudinal data in social science, epidemiology, and econometrics, which increases the estimator precision by increasing the number of observations and obtain more dynamic information than a single cross-sectional data [81]. PDM contains three kinds of models, the Pooled Effects model (PE), Fixed Effects model (FE), and random effects model (RE), wherein the FE model can eliminate the influence of individual-variant but time-invariant unobserved confounders. A common panel data regression model can be described through suitable restrictions of the following general model: Considering that economic factors and medical services may also affect the public health [79,80], several socioeconomic indicators were also selected as ancillary variables, such as total population (Pop), urban population (Urban Pop), population density (Pop density), built-up area (Built-up), gross domestic product (GDP), and medical workers (Medi_workers). Similarly, the natural log-transformation was applied to all the above variables.
(3) Temperature and Humidity. Since temperature and humidity could affect the incidence of epidemics, the PDM also added variables of temperature, such as annual average temperature (Average T), maximum temperature (High T), lowest temperature (Low T), and relative humidity (Humidity). The data is available from the National Meteorological Information Center (http://data.cma.cn/).
The panel data consists of indicators of 31 provinces, autonomous regions, and municipalities in ten-year, so we have 310 observations (31 × 10 = 310).

(1) Moran's I
To explore the spatial distribution of green space and epidemic diseases, this article used Moran's I as a measure of spatial autocorrelation. Moran's I is defined as: where N is the number of spatial units indexed by i and j, x is an observed variable, x is the mean of x, w ij is a matrix of spatial weights, and W is the sum of all w ij . The value of Moran's I is between −1 to 1, showing whether the spatial distribution of such variable is dispersed, clustered, or random.
(2) PDM PDM is a popular quantitative method for longitudinal data in social science, epidemiology, and econometrics, which increases the estimator precision by increasing the number of observations and obtain more dynamic information than a single cross-sectional data [81]. PDM contains three kinds of models, the Pooled Effects model (PE), Fixed Effects model (FE), and random effects model (RE), wherein the FE model can eliminate the influence of individual-variant but time-invariant unobserved confounders. A common panel data regression model can be described through suitable restrictions of the following general model: where y is the explained variable, X is the explanatory variable, α is the intercept, β is coefficients, i and t are indices for individuals and time, and ε is the error. In the PE model, for all i and t, α it = α and β it = β, that is: In the FE model, for all t, α it = α, that is: In the RE model, ε it is assumed to vary stochastically over i and t requiring special treatment of the error variance matrix. To determine which model should be chosen, all the models need to be applied respectively. The Breusch-Pagan Lagrange Multiplier test should be implemented on the result of the PE model to decide whether the PE is the fittest, and the F-test can also be used to choose between FE and PE. The selection for fixed and random effects specifications is based on the Hausman-type test [82].

General Description of Green Space and Diseases
In general, decreasing trends over time were observed in all the three epidemic diseases with the average incidence rate of Dysentery, Tuberculosis, and Malaria decreasing from 40.51 to 12.99, 91.84 to 68.85, and 3.81 to 0.19, respectively (Table 1). In contrast, it was found that all kinds of green spaces expanded, and the mean logarithm value of Green Space and Public Green Space correspondingly increased from 5.56 to 8.98 and 1.08 to 2.10, and the mean green space coverage ratio increased from 34.18% to 39.17%.        Figure 5 illustrated the spatial distribution of the incidence rates of three diseases in 2007 and 2016, which indicates that the incidence rates of Dysentery and Tuberculosis show obvious spatial clustered patterns. Such interpretations can also be verified in the spatial autocorrelation via Global Moran's I statistics in Table 2. There existed a clear trend of increasing Moran's I in Dysentery incidence, but a slight decreasing in Tuberculosis incidence from 2007 to 2016 in China; both results  Figure 5 illustrated the spatial distribution of the incidence rates of three diseases in 2007 and 2016, which indicates that the incidence rates of Dysentery and Tuberculosis show obvious spatial clustered patterns. Such interpretations can also be verified in the spatial autocorrelation via Global Moran's I statistics in Table 2. There existed a clear trend of increasing Moran's I in Dysentery incidence, but a slight decreasing in Tuberculosis incidence from 2007 to 2016 in China; both results were significant at the p = 0.001 level. However, no obvious spatial autocorrelation was detected in the incidence of Malaria.

Spatial Distribution of Green Space and Diseases
were significant at the p = 0.001 level. However, no obvious spatial autocorrelation was detected in the incidence of Malaria.
Further, Global Moran's I statistics of control variables in Table 3 reveals that only the green space coverage ratio displayed a cluster pattern in the three kinds of green space statistics. Spatial autocorrelation was also observed in the logarithm value of Urban Population, Built-up area, and GDP, which share a similar hierarchical distribution in Figure 6.    Further, Global Moran's I statistics of control variables in Table 3 reveals that only the green space coverage ratio displayed a cluster pattern in the three kinds of green space statistics. Spatial autocorrelation was also observed in the logarithm value of Urban Population, Built-up area, and GDP, which share a similar hierarchical distribution in Figure 6.

Correlation Test and Panel Data Analysis
The results in Figure 7 provide an overview correlation of all variables by R with package Performance Analytics. The chart is quite revealing in several ways. Firstly, all variables, except Malaria and Population density, presented a noteworthy correlation with others. Furthermore, most of the control variables contributed to a negative correlation with the incidences of the three diseases. The temporal variables were significantly associated with other explanatory variables, imposing different effects on diseases and sharing a similar situation with indicators of green spaces. However, the approximately straight lines in scatter plots implied there existed multi-collinearity among variables, which can be straightforwardly observed in variables of Urban Pop, Built-up, and GDP. A further stepwise regression combining with the PDM was taken to explore the influence of multicollinearity.

Correlation Test and Panel Data Analysis
The results in Figure 7 provide an overview correlation of all variables by R with package Performance Analytics. The chart is quite revealing in several ways. Firstly, all variables, except Malaria and Population density, presented a noteworthy correlation with others. Furthermore, most of the control variables contributed to a negative correlation with the incidences of the three diseases. The temporal variables were significantly associated with other explanatory variables, imposing different effects on diseases and sharing a similar situation with indicators of green spaces. However, the approximately straight lines in scatter plots implied there existed multi-collinearity among variables, which can be straightforwardly observed in variables of Urban Pop, Built-up, and GDP. A further stepwise regression combining with the PDM was taken to explore the influence of multicollinearity. The General regression model with a logarithm transformation of epidemic incidence could be written as: ln(incidence) = a + β 1 *ln(Healthworker) + β 2 *ln(GS) + β 3 *ln(PublicGS) + β 4 *ln(GScoverage) + β 5 *ln(Pop) + β 6 *ln(UrbaPop) + β 7 *ln(Popdensity) + β 8 *ln(Built.up) + β 9 *ln(GDP) + β 10 *AverageT + β 11 *HighT + β 12 *LowT + β 13 *Humidity + ε Based on the PDM, the three models of Pooled Effects (PE), Fixed Effects (FE), and Random Effects (RE) should been applied and compared by Lagrange Multiplier Test, F Test, and Hausman Test. The result in Table 4 indicated that the FE model was better for Dysentery, while the RE models were better for Tuberculosis and Malaria.
All the following tables (Tables 5-7) provide results of traditional PDM and PDM with stepwise regression, indicating that the stepwise regression can also be applied in PDM to eliminate the multicollinearity and optimize the regression model.
In Table 5, the results of the FE model of Dysentery reveal that only the indicator of general green space shows the connection with dysentery. Among socioeconomic variables, Population, GDP, and Medical workers are negatively correlated with the incidence of dysentery, while Urban population and Built-up are positive. Average temperature and Humidity also play an important role in reducing the incidence of dysentery. A further step of stepwise regression to eliminate multicollinearity retains only three variables with decreased Variance Inflation Factors (VIF): the number of medical workers, general green space, and relative humidity. For every 1% increase in the number of medical workers and green space area, the incidence of dysentery will decrease by 1.225% and 0.235%, respectively, and for every 1% increase in relative humidity, the incidence will decrease by 0.023%.
In Table 6, the results of the RE model of Tuberculosis show that all the three green space indicators impose significant effects on tuberculosis, wherein GS and Public GS are negative, and GS coverage is positive. An increase of Population, GDP, Urban population, and Lowest temperature may enlarge the risk of tuberculosis, while Medical workers, Built-up area, average temperature, and relative humidity would reduce the risk. Further stepwise regression keeps five variables, in which GS, Population, and Humidity are positive, and GDP and average temperature are negative. After the process of multicollinearity elimination, GS, GDP, and Humidity show opposite correlations. For every 1% increase in population and relative humidity, the incidence of Tuberculosis would increase by 0.425% and 0.011%, while for GDP, Tuberculosis would decrease by 0.619%. An average temperature increase of one Celsius degree will result in a reduction of Dysentery by 0.014%.
The results of the RE model of Malaria in Table 7 indicate that only Public GS and Highest temperature are associated with the incidence of malaria. In the stepwise regression operation, four variables remained, wherein GS, Pop Density, and Low T are positive, and yet High T is negative.   The General regression model with a logarithm transformation of epidemic incidence could be written as: ln(incidence) = a + β1 *ln(Healthworker) + β2*ln(GS) + β3*ln(PublicGS) + β4*ln(GScoverage) + β5*ln(Pop) + β6*ln(UrbaPop) + β7*ln(Popdensity) + β8*ln(Built.up) + β9*ln(GDP) + β10*AverageT + β11*HighT + β12*LowT + β13*Humidity + ε Based on the PDM, the three models of Pooled Effects (PE), Fixed Effects (FE), and Random Effects (RE) should been applied and compared by Lagrange Multiplier Test, F Test, and Hausman Test. The result in Table 4 indicated that the FE model was better for Dysentery, while the RE models were better for Tuberculosis and Malaria. All the following tables (Table 5-7) provide results of traditional PDM and PDM with stepwise regression, indicating that the stepwise regression can also be applied in PDM to eliminate the multicollinearity and optimize the regression model.

Discussion
The initial purpose of this research was to compare the correlations between various kinds of green spaces and epidemics. An unexpected finding is that all three kinds of green space display different correlations with epidemics in PDM, while the results in PDMSR further indicate that only GS is significantly correlated with the three epidemics. This result can be explained by the apparent collinearity between variables of Medical workers, Population, Urban Population, Built-up Area, and GDP in Figure 7, ranging from 0.85 to 0.96. Table 8 also shows that green space has different effects on dysentery, tuberculosis, and malaria in China, which has not been previously described. An increase in green space would help reduce the incidence of dysentery, while increasing the risk of Tuberculosis and Malaria. Such a difference may be explained by increasing green space along the process of urban expansion in China. Urban expansion with a better socioeconomic condition will largely reduce the incidence of dysentery by providing better sanitary services [29]. The negative correlation of GDP in PDM of Tuberculosis also demonstrates the importance of the economy, whereas the positive correlation of Green space and Population suggest that population increases in urbanization process may raise the risk of tuberculosis [48,49]. A similar demographic influence can be observed in the PDM of Malaria, wherein variables of green space and population density are significantly positively correlated, as opposed to previous findings that there existed no correlation between green space and malaria [56,57]. The result could also be related to the potential habitat for mosquitoes provided by green space [55].
Urban expansion on the edge of the city will lead to a simultaneous increase in green space and population, and immigrants with low socioeconomic status may face the threat of epidemics and inadequate medical services. Such results are consistent with previous studies on urbanization and epidemiology [27]. Moreover, socioeconomic development still plays a crucial role in controlling epidemics, which is in line with expectations in the previous literature that urbanization would lead to higher economic conditions, better health care services, and more green space, benefiting the control of epidemics [19][20][21]. The geographical distribution of green space and other socioeconomic variables in Figure 6 and the Global Moran's I index in Table 2 also provide a supplementary explanation for this result. Most of the Chinese provinces with high GDP are located along the coast of China, which have a better quality of green space and other health facilities.
What is surprising is that the models of Tuberculosis and Malaria exclude the variable of medical workers representing the level of health care service, and this result seems to be contrary to most literature [30]. A possible explanation may be that the incidence of these two epidemics has less connection with medical care conditions, more with the heterogeneous environment of various contagious probability [28].
In terms of temperature and humidity variables, only relative humidity has a negative impact on dysentery, contrary to previous literature which argues that high temperature and humidity can lead to larger dysentery incidence [43]. Such results may be related to the provincial average relative humidity value used in this study. As for tuberculosis, the result confirms that cold weather may increase the likelihood of pathogenic exposure, while higher humidity may contribute to the spread of tuberculosis [51]. In addition, it has been confirmed that temperature parameters are associated with malaria as the minimum and maximum temperature are highly correlated with Malaria. Since the minimum temperature is significantly negatively correlated with Normalized Difference Vegetation Index( NDVI) [61], the impact of green space on malaria may be further enhanced.

Conclusions
The main contribution of this paper is that we explored the different relations between epidemics and the three kinds of green space, which had rarely been explored and compared together. The result provides more meaningful guidance for public policy on the construction of green space correspondingly. Furthermore, a PDM with stepwise regression can alleviate the influence of multicollinearity and provide more accurate correlations between the variables.
As the indicators of green space become more and more popular in evaluating the construction of healthy cities, this article argues that the increase of green space will not straightforwardly lead to a better healthy condition, especially in the process of urban expansion. The government should pay more attention to the quality of green space to improve physical activities of residents, ensure enough sanitary facilities, and reduce mosquito habitat in green space by appropriate landscape design and technologies.
However, the study does have limitations. First, the main weakness of the current study is taking the provincial boundary as geographical units, which limits the resolution of the sample size. Second, although spatial autocorrelation had been pointed out, spatial panel model could be applied to taking spatial weight into account. Another exploration worth discussing may be the link between green space and the infection channels of diseases. For instance, dysentery is transmitted through human contact, food, polluted water, and flies, while tuberculosis and malaria are mainly infected by respiratory tract and mosquitos, respectively. Therefore, more detailed studies would be necessary to explore how green space influence the infection channels of diseases.

Conflicts of Interest:
The authors declare no conflict of interest.