Spatial Analysis of Built Environment Risk for Respiratory Health and Its Implication for Urban Planning: A Case Study of Shanghai

Urban planning has been proven and is expected to promote public health by improving the built environment. With a focus on respiratory health, this paper explores the impact of the built environment on the incidence of lung cancer and its planning implications. While the occurrence of lung cancer is a complicated and cumulative process, it would be valuable to discover the potential risks of the built environment. Based on the data of 52,009 lung cancer cases in Shanghai, China from 2009 to 2013, this paper adopts spatial analytical methods to unravel the spatial distribution of lung cancer cases. With the assistance of geographic information system and Geo-Detector, this paper identifies certain built environments that are correlated with the distribution pattern of lung cancer cases in Shanghai, including the percentage of industrial land (which explains 28% of the cases), location factors (11%), and the percentages of cultivated land and green space (6% and 5%, respectively). Based on the quantitative study, this paper facilitates additional consideration and planning intervention measures for respiratory health such as green buffering. It is an ecological study to illustrate correlation that provides approaches for further study to unravel the causality of disease incidence and the built environment.


Introduction
Lung cancer has become a serious public health problem throughout the world. According to data provided by the International Agency for Research on Cancer (IARC) in 2018, the incidence and mortality of lung cancer ranked first among all cancers and the total number of new lung cancer cases worldwide was about 2.09 million, accounting for 11.6% of all new cancer cases, while the number of lung cancer deaths was about 1.76 million, accounting for 18.4% of all cancer deaths [1]. Lung cancer also has the highest morbidity and mortality in China. According to the data released by China National Cancer Center in February 2018, the total incidence of lung cancer in 2014 was about 781,000, accounting for 20.6% of all cancer cases [2]. The mortality of lung cancer in China increased by 465% in the 30 years from 1983 to 2013 [3]. It is important to identify risk factors of lung cancer and to develop prevention policies involving all relevant areas.
There are many pathogenic factors for lung cancer. The recognized risk factors include smoking and passive smoking, family genetics, air pollution, work environment exposure and diet. Outdoor air pollution was identified as a cause of lung cancer by IARC under the World Health Organization (WHO) in 2013 [4]. With the rapid growth of motor vehicle traffic and rapid industrialization, the air pollution exposure of urban residents has significant effects on their respiratory health. According to Studies have shown that some variables of built environments have significant effects on specific disease risks, such as heart disease, asthma and lung cancer [23,37]. It is worth further exploring the direct relationship between built environments and health outcomes. This paper, therefore, explores the correlation between specific factors of the built environment and the incidence of lung cancer. The research questions in this paper are as follows: (1) does the spatial distribution of lung cancer patients have a regular pattern (or show certain spatial distribution pattern)? (2) Is there a correlation between lung cancer and the built environment? Based on the causal path of "built environment-outdoor air pollution-lung cancer" confirmed by previous studies, this study selects potential built environment variables that affect outdoor air pollution, and then uses spatial analysis method to explore the spatial distribution pattern of lung cancer and its correlation with built environment factors. Based on the identified significant built environmental factors, this paper proposes corresponding strategies for planning and design to promote respiratory heath as well as the development of healthy cities.

Study Site
This research selected Shanghai as the study site ( Figure 1). Located in the eastern part of China, Shanghai is one of the four municipalities directly under the administrative control of the Central Government of China. Its total area is about 6340 square kilometers, with a total population of 24.18 million in 2017. In the Sixth National Population Census in 2010, there were 230 administrative units at the street/town/township level in Shanghai, comprised of 99 streets, 110 towns, 2 townships and 19 specific administrative units, including 5 industrial parks, 5 comprehensive technology development zones, 1 tourist area and 8 agricultural parks/farms ( Figure 2). Street in Shanghai refers to a type of administrative unit within the urbanized area which is at the same administrative level as a town and township. The spatial unit analyzed in the study includes only four types: street, town, township and industrial park. Because some of the other types have been established after the registration of lung cancer cases or alternatively contain no residents, they were excluded from the analysis. The street/town/township level administrative unit is chosen as the basic spatial unit for data aggregation and analysis in cross-sectional research. The reasons are as follows: (1) Streets/towns/townships have large enough population bases to ensure that the calculation error of the incidence would be small, especially when calculating the age standardized incidence rate; (2) the function of the street/town/township administrative unit in Shanghai is relatively independent and complete, which can better reflect the overall characteristics of built environment. Shanghai usually could be divided into five regions according to the circular expressway: within the Inner Ring; the Inner Ring to the Outer Ring; the Outer Ring to the Suburban Ring (suburban areas); outside of the Suburban Ring (outer suburbs, except Chongming District); Chongming District ( Figure 1). The central city area is inside the Outer Ring, while the suburbs are outside the Outer Ring. The study tracks the heterogeneity among these regions and takes street/town/township as the basic spatial unit for analysis.

Lung Cancer Data
Lung cancer data was provided by Shanghai Center for Disease Prevention and Control. The data includes sex, age, occupation, smoking history and family address information of lung cancer cases confirmed in all hospitals in Shanghai from 2009 to 2013. During this period, there were 53,805 new cases of lung cancer occurred in Shanghai, including 36,167 males (67.22%) and 17,638 females (32.78%).
Since most cohort studies show that lung cancer is rare in young people under 40 years of age, it would be difficult to measure the incidence of lung cancer in young people [25]. The population over 86 years old has a small base, so it is easy to cause calculation errors when calculating the incidence. The study, therefore, selects cases aged 50-86 for analysis, which accounted for 90% of the total number of cases. We excluded 1063 cases which could not be spatially located because of the lack of family address information. This process ended with 11,244 cases in the study, including 4634 males and 6610 females who never smoke and did not report passive smoking.
There exist several ways to measure the risk of lung cancer in a specific spatial unit. In this study, standardized incidence rate (SIR) is adopted to represent the risk of lung cancer. SIR can effectively avoid the influence of age structure on the measurement of lung cancer incidence comparing with raw rate. The formula for calculating the SIR of lung cancer is as follows: where SIR is the standardized incidence rate, R i is the age-specific incidence rate of the age group i, and w i is the weight of the age group i. The calculation of weight w i is as follows: where N i is the population of age group i in the standard population and N is the total population of the standard population. This study takes the population of the whole city as the standard population.

Environmental Factors of Lung Cancer and Their Proxies
Epidemiological studies of lung cancer have identified risk factors for lung cancer such as smoking, passive smoking, familial inheritance, air pollution (outdoor and indoor air pollution), work environment exposure and diet [38]. Relevant studies have identified certain factors of urban built environment affecting outdoor air pollution, including land use, spatial form, transportation system, green space and public open space [39]. The scale and layout of specific types of land use may affect outdoor air pollution, thus affecting respiratory health, including industrial land [40][41][42][43][44], cultivated land [44], residential land [40][41][42]44] and waterbody [42,44]. Urban spatial form can affect the urban wind environment, thus affecting the emission and diffusion of air pollutants. The commonly used indicators reflecting urban spatial form are building density [41,45,46] and floor area ratio (FAR) [42]. Previous studies have confirmed the impact of transportation systems on the risk of lung cancer, involving the distance from the main roads and road density [21,22]. It is also confirmed that the scale and layout of green space have a significant impact on the atmospheric environment [47]. Figure 3 shows risk factors of lung cancer and their proxies. This study, therefore, adopts the percentage of specific types of land to the total area of geographic units to reflect the characteristics of land use scale, and parcel density to reflect the characteristics of land use layout. More specifically, the greater the parcel density, the more dispersed the land use layout. The land use types for modeling include industrial, residential, commercial, cultivated, and rural developed land, as well as warehouse and land ready to be built. Under the title of this paper of built environments, we do include certain environmental factors outside the built area, such as cultivated land, in the analysis. The underlying reason is that the percentage and layout of cultivated land can reflect the proportion of primary industry and urban-rural characteristics at the street/town/township level. According to the percentage of cultivated land and other indicators, the entire city can be divided into rural areas, urban-rural mixed areas and urbanized areas. Variables of spatial form are not included in the study because building density and FAR cannot effectively represent the spatial form of suburbs. In terms of transportation system, this study uses the percentage of area for roads to total land and road density to reflect basic traffic volume within the street/town/township. The study differentiates the street/town/township with or without crossing expressways. For green land, this study uses the percentage of urban independent green space to the total area of street/town/township to measure its scale, and green space density to measure its layout feature.
Data for land use and road system was calculated through vectorization of Shanghai Land Use Status Map (2011) provided by Shanghai Municipal Bureau of Planning and Land Use. The population data was obtained from the sixth census in 2010 and the Point of Interest (POI) data was from Baidu Map.

Analytical Methods
With a focus on the spatial distribution pattern of lung cancer cases in Shanghai, this paper explores the correlation between built environment and health outcomes. The basic hypothesis is the following: (1) lung cancer cases are not randomly distributed in the city; (2) the spatial distribution of lung cancer cases in the city is related to the built environment. In order to verify these hypotheses, this study conducts two parts of research: (1) spatial autocorrelation analysis; (2) spatial stratified heterogeneity analysis.
For the first part of the research, the clustering pattern of lung cancer incidence is identified through the test of spatial autocorrelation. The regions with high, low and abnormal incidence are identified with this method. The spatial distribution pattern of lung cancer incidence is revealed, which provides clues for further attribution analysis. The analytic method includes Moran's I and Local Indicators of Spatial Association (LISA) to explore the global and local spatial correlation of lung cancer in ArcGIS 10.3 analysis software (Esri, Redlands, CA, USA).
For the second part of the research, this study uses Geo-Detector to analyze the spatial stratified heterogeneity of the distribution of lung cancer cases. The Geo-Detector is based on the assumption that if an environmental factor plays an important role in a disease, the spatial distribution of the disease and environmental factors should present similar pattern [48,49]. Based on the variance analysis, the Geo-Detector uses Power of Determinant (PD) value to quantitatively measure the degree of disease explained by environmental factors. The formula for calculating the value of PD is as follows: where h = 1,2,... L is the stratification of environmental factors. N h is the number of spatial units of the sub-region h, and N is the total number of spatial units of the whole region. σ 2 h and σ 2 are the variances of the dependent variable within the sub-region h and the whole region respectively. The value of PD is between 0 and 1. The larger the value of PD, the higher the degree of interpretation of spatial distribution of disease by environmental factors.
Geo-Detector assumes a non-linear correlation between the environmental factors and the spatial distribution of diseases, and is designed to effectively avoid the multiple collinearity of independent variables in traditional regression models. A large number of studies have used Geo-Detector to analyze the relationship between disease distribution and built environment factors [50,51]. The detail steps of adopting Geo-Detector in this study is to discretize the built environment variables which have potential impacts on outdoor air pollution confirmed by relevant studies and then to establish various stratification with specific characteristics of built environment. Three modules of Geo-Detector are used including factor detector, risk detector and interaction detector. The factor detector aims to determinate the impact of built environment indicators on lung cancer incidence. The risk detector aims to analyze the effect of built environment variables on lung cancer incidence, while the interactive detector aims to analyze the interactive effects between built environment, population and economy.

Descriptive Statistics of Lung Cancer Incidence and Built Environment Factors
The lowest SIR of street/town/township in Shanghai was zero, while the highest was 104.5/10,000 and the average was 7.86/10,000. The SIR of lung cancer in different sex and age groups presents different features (Figure 4). In most age groups, the SIR of lung cancer of women is higher than that of men, who never smoke and not report passive smoking. Contextual risk factors instead of smoking, therefore, might present a significant effect on women. The distribution of the built environment factors varied across Shanghai ( Table 1). The 75% quantile of most built environment variables differs greatly from its maximum value, which indicates that certain administrative units present abnormal values due to their specific functions. For example, while the average percentage value of industrial land is 8.88%, the maximum value is 52.69%, which suggesting the existing of master-planned industrial parks. While the average percentage of transportation land is 5.77%, the highest percentage of transportation land is 42.33%, which indicates there are large transport hubs in the specific administrative unit such as airports and railway stations. Although Geo-Detector can effectively deal with the problems caused by extreme values, it is still necessary to exclude special administrative units, in order to avoid the impact of extreme values on the analysis results.

Spatial Clustering Characteristics of Lung Cancer Cases
The results of global spatial autocorrelation test show that Moran's I and its significance are 0.039 (p = 0.001), 0.041 (p = 0.001) and 0.133 (p = 0.000) respectively for all residents, men and women. This means that the distribution of lung cancer cases in Shanghai presents a significant clustering feature. The spatial distribution of female lung cancer cases is higher than that of male lung cancer cases, which indicates that environmental factors may have a more significant impact on the incidence of female lung cancer. Hot spot analysis shows that high incidence areas of male and female lung cancer cases are located in several large-scale industrial parks ( Figure 5). The results of LISA analysis show that the central urban area within the Inner Ring was also a relatively high incidence area of lung cancer for both men and women. This indicates that high-density urban areas may have a specific impact on the incidence of lung cancer.

Spatial Stratified Heterogeneity of Lung Cancer Cases
As described above, the factor detector of Geo-Detector is adopted to unravel the impact of risk factors on the incidence of lung cancer, and the variables ranked by PD value as follows: administrative unit type (0.64) > industrial land percentage (0.28) > cultivated land percentage (0.06) and green land percentage (0.05) ( Table 2). The result shows that the type of administrative units and the percentage of industrial land are the two main factors to explain the spatial stratified heterogeneity of lung cancer. The percentage of cultivated land and green land can also explain the distribution of lung cancer to a certain extent.
The risk detector is adopted to analyze the detailed effect of these variables on the SIR of lung cancer. Table 3 shows the effect of the type of administrative unit measured by urbanization rate on the lung cancer incidence. With the increase of urbanization rate, the average SIR of lung cancer in towns (4.21/10,000), townships (6.86/10,000) and streets (7.29/10,000) gradually increased. The average SIR of lung cancer in industrial parks (63.52/10,000) was significantly higher, about 10 times the incidence of other administrative units. The results are consistent with the results of hot spot analysis, indicating that industrial parks are high-risk areas for lung cancer.  Table 4 shows the impact of the percentage of industrial land on the incidence of lung cancer. When the percentage of industrial land exceeds 30%, the SIR of lung cancer increases significantly. There is no significant difference in lung cancer incidence between regions with less than 10% of industrial land and regions with 10% to 30% of industrial land. As for the percentage of cultivated land and green land, there is no significant difference in the risk of lung cancer when respectively increasing the value of these two variables. It has been found that the risk of lung cancer in the administrative areas with large-scale industrial park is high. In order to avoid the possible extreme value impact of industrial parks, factor detectors and risk detectors of Geo-Detector are repeatedly used after excluding the variable of industrial park.
The new results show that the spatial stratified heterogeneity of lung cancer incidence caused by the administrative unit type is no longer significant, and the scale and layout of industrial land is no longer significant as well. Combining with the consideration of industrial park locations, it indicates that the impact of industrial land on lung cancer is mainly located in industrial parks, especially in the suburbs, but not in other areas. Location presents a significant correlation with the distribution of lung cancer in the new results, which can explain the incidence and distribution of lung cancer to 10.8%.
For representing the location, five regions are classified according to the important traffic routes in Shanghai. Among them, the highest incidence of lung cancer is found in the areas within the Inner Ring and the Outer Suburbs, which is 8.08/10,000 and 7.81/10,000, respectively, followed by the areas from the Inner Ring to the Outer Ring and suburban areas, where the incidence of lung cancer is 6.56/10,000 and 6.69/10,000, respectively. The incidence of lung cancer in Chongming District is the lowest, 5.00/10,000.
The analysis results after excluding industrial parks show that the explanatory degree of the percentage of cultivated land to lung cancer incidence increases to 11.3%. The correlation between the percentage of cultivated land and the incidence of lung cancer is inverted "U" type. The average incidence of lung cancer is lower in areas with cultivated land area accounting for more than 60% or less than 1%, while the average incidence of lung cancer is higher in areas with cultivated land area accounting for 20-60%. This indicates that the incidence of the outer suburban area with an urban-rural mixture is higher than that of urban and rural areas.
Finally, we use the interactive detector of Geo-Detector to analyze the interaction of built environmental variables on the incidence of lung cancer. Figure 6 shows the interaction between certain land uses and urban-rural types, locations and administrative unit types. The results show that the PD value of land use variables significantly increases after the interaction with urban and rural types and locations, which indicates that there is a synergistic effect between land use variables, urban and rural types and locations. Furthermore, the interaction between land use percentage and land distribution has a higher degree of explanation for lung cancer incidence than its respective effect.

Discussion
Clustering and geographic correlation are two significant approaches in spatial epidemiology [52]. This study, therefore, analyzes the spatial autocorrelation and stratified heterogeneity of lung cancer cases, so as to explore the geographic correlation between lung cancer incidence and built environment factors. Arguably one of the biggest challenges facing spatial epidemiology is that of identifying geographic distribution pattern (e.g., outliers, clusters) above and beyond background variation [53], based on which this study develops two hypotheses to test. The first hypothesis is confirmed by the result of clustering analysis: the spatial distribution of male and female lung cancer cases in Shanghai presents significant clustering characteristics, with a more obvious pattern of female lung cancer cases. There are two types of high incidence areas of lung cancer: one located near large industrial parks and the other located in urban central areas. The second hypothesis has been verified that the spatial distribution of lung cancer cases is correlated with the types of administrative units, the geographical location of cases, land use and green space.
Our study found that the type of administrative unit could explain 64.3% of the spatial distribution of lung cancer cases. The classification of urban administrative units is mainly based on the level of economic development and economic structure, reflecting the basic functions and characteristics of a spatial unit. As one of the specific types of administrative unit in the study, industrial parks consist of a large scale of industrial land, accounting for 30-50% of the total land use, and the layout of industrial land is relatively concentrated. Usually around 10% of urban residential land, 5% of rural construction land and 15% of farm land is located in these industrial parks. This study found that residents living within the administrative unit labeled as industrial park have a significantly high risk of respiratory health. Our findings for industrial pollution affecting lung cancer are consistent with previous studies, including a review reporting in 1990 that people living near non-ferrous smelters and a variety of other heavy industrial types may have an increasing risk of lung cancer in in Italy and Spain [54]. We don't find that industrial land has a significant impact on lung cancer in areas other than industrial parks, possibly because the areas of other industrial land are smaller and then have a slim influence on public health. It may also because their industry types have no air pollution.
When industrial parks are excluded from the analysis, we find that geographical location is the most important environmental factor to explain the spatial distribution of lung cancer. It is defined according to the important traffic routes. The administrative units in the same geographical location not only have similar built environment characteristics, but also have more similar function orientations and economic structures.
The results of this study support that the incidence of lung cancer in rural areas is lower than that in urban areas, which was proposed by the latest epidemiological study of lung cancer in China as the incidence of lung cancer in Chongming District is significantly lower than other areas in Shanghai [55]. However, in some similar studies, the classification methods of urban and rural areas are single and dualistic, and the areas between urban and rural areas are neglected. This study discovers that the risk in outer suburbs presents as significantly higher than that in urban peripheries. Outer suburbs present a typical urban-rural mixture with average cultivated land that accounts for 50% of total land use, while urban peripheries have mostly completed urbanization (average cultivated land accounts for 26%). This is consistent with the results of risk detector analysis. Compared with urban and rural areas, respiratory health risk in areas with urban-rural mixture is higher.
Our study also found that urban areas within the inner ring present a high risk for lung cancer, which is consistent with the results of local spatial autocorrelation analysis. A similar study showed that the relative risk of lung cancer for residents living in the city center was 1.5 (95% CI: 1.0-2.2) compared with ordinary residents, possibly due to excessive air pollution [56]. Since the high-density urban area within the Inner Ring in Shanghai does not contain industrial land, the reason for its high incidence of lung cancer may be traffic and other pollution factors, which needs further exploration.
The result of interaction detector analysis shows that urban-rural types and locations play a synergistic role in the impact of built environment on the incidence, which indicates that the impact of built environment on respiratory health presents different between urban and rural areas as well as in different locations. The spatial distribution of lung cancer is affected by both the scale and layout of land use. The interaction analysis shows that the impact of one built environment factor on lung cancer is adjusted and enhanced by other built environment factors.
There are two main limitations in this study. First, the impact of workplace exposure on the incidence of lung cancer has not been included in the analysis due to the lack of data. Secondly, this study does not include the impacts of micro-community environment and indoor residential environment. Future research can capture these impacts with variables to represent spatial features of these two levels.

Conclusions
The main finding of this study is that large industrial parks and their adjacent areas, high-density urban central areas and outer suburban areas, are high incidence areas of lung cancer in Shanghai. Based on the significant variables identified by the Geo-Detector, suggestions for planning intervention include: (1) protective green belts need to be enhanced in order to eliminate the potential pollution of large-scale industrial parks; (2) better environment regulation may be required in the outer suburban areas; (3) urban areas with high-density of buildings and roads need better solutions for decreasing air pollution from heavy traffic.
This cross-sectional study attempts to identify potential built environment factors associated with lung cancer, while it cannot reveal the temporal relationship between environmental exposure and health outcomes. Further study is needed to discover the causality between built environment and lung cancer. More specific areas such as suburbs or urban centers can be respectively and thoroughly explored in the further study. Longitudinal rather than cross-sectional studies can be used to unravel the causal relationship between built environmental factors and respiratory health outcomes.