2.1. Study Area
Shanghai, which has a total population of over 23 million as of 2010, is the largest city by population in the People’s Republic of China. It sits at the mouth of the Yangtze River in the middle of the Chinese coast, between the latitude 30°41′33.07″ to 31°52′4.25″ North and between the longitude 120°51'12.03″ to 121°58′49.17″ East, covering a total area of approximately 6,340.5 km2
). The city borders to Jiangsu and Zhejiang Provinces to the west and is bounded to the east by the East China Sea. Occupying part of the alluvial plain of the Yangtze River Delta, Shanghai lies generally on a flat and low-lying land, with the exception of some hills in its western regions. Its altitude varies between 3 to 5 m above mean sea level and increases from east to west.
Location of the study area.
Location of the study area.
Shanghai is an important economic and financial center in China. The great economy achievement of the city benefited from the amazingly rapid industrial development, with the automobile, electronic and communication equipment, petrochemical, steel product, equipment assembly, and biomedicine industries being promoted as the six pillar-industries. The industries in the city are still given considerable attention, and industrial production continues to grow rapidly. Though generating much economic profit for the city, these industries affect the environment and pose risks to citizens.
2.2. Quantitative Dimension of Vulnerability
In this paper, human vulnerability to chemical hazards involves exposure, which is the degree of a human community in contact with chemicals, sensitivity, which is the degree of a receptor affected when exposed to chemical hazards, and coping ability, which is the ability of a receptor to resist or recover from the damage associated with exposure to chemical hazards. Exposure and sensitivity place targets in potential dangers, whereas lack of coping ability reflects the inability of targets to respond to hazards.
Some proximity models are available for measuring human exposure. The simple nature of proximity models allows for their wide use in exposure assessment studies with few data requirements. These models are based on the assumption that exposure at locations nearer to emission sources are higher compared with locations further from the source. However, some parameters, such as emission rate and physicochemical characteristics of the emitted substances, are not considered in these models.
Zou et al.
] developed an emission weighted proximity model (EWPM) to calculate the relative individual exposure from the traditional proximity model. EWPM considers the emission rate and emission time of each source. The formula for calculating exposure values on the basis of EWPM is as follows:
are the emission rate and emission time of the j
th emission source, which the i
th receptor is exposed to, respectively; Di,j
is the distance of the i
th receptor to the j
th emission source; m
is the number of emission sources; n
is the number of receptors.
However, this model is only suitable when all sources emit the same hazardous substance. Sources in a region may emit different hazardous substances. A source may even emit more than one type of hazardous substance. Therefore, we modify Equation (1) by considering the toxicity of each hazardous substance, as expressed in the following equation:
represents a dangerous chemical; LD50
) is the median lethal dose of chemical c
(g/kg), which is the dose, given all at once, causing the death of half the members of a group of test animals [28
]. The LD50
is frequently used as a general indicator of the acute toxicity of a substance [31
], and we use this figure to determine toxicity to humans. For the study area, the main corporations or plants concerning hazardous chemicals are taken into account to calculate the levels of exposure, which directly reflect the potential hazards that humans are exposed to with respect to the vulnerability.
Population density, which provides information on spatial concentration and distribution of people, is used to indicate the sensitivity of the study area in this work. Generally, highly dense areas with high population concentration show higher vulnerability to hazards compared with lowly dense areas, for hazards occurring in areas with denser population will result in greater harm than in less dense areas. For instance, a large, severe leakage of hydrogen sulfide that passes through an open field presents little danger. By contrast, a relatively weak leakage of the same substance can pose significant risks to human life in densely populated areas. In addition, widely available open spaces in lowly dense areas can function as refuge bases and as disaster recovery bases in times of emergency. In short, the higher the population density and the more compact the area is, the heavier the loss a community will suffer when exposed to hazards. Therefore, population density is of great importance in indicating the sensitivity of human vulnerability in an area, directly reflecting the degree of damage when exposed to hazards.
2.2.3. Coping Capacity
By considering the coping capacity, we screen indices on income, medical service supply, and access to social resources, such as hospitals. The gross domestic product (GDP) per capita represents the general income of an area. High levels of this feature usually result in the construction of high-quality infrastructure, installation and maintenance of early warning systems, modern civil protection, and the compensation of costs for reconstruction in disaster-struck areas. With these complete infrastructures and high-level emergency management, human vulnerability will be reduced. Thus, high value of GDP per capita will result in low vulnerability.
Hospital beds per 10,000 population can represent the medical treatment level of an area. The value of this indicator directly reflects the abilities of an area in rescuing and providing health care for the people. A high value of this indicator denotes that the medical treatment level of an area is high, which leads to a high level of health care and rescue. As a result, the damage of human by chemicals will be relieved, and the vulnerability will be mitigated.
Access to social resources is critical in resisting chemical hazards. For instance, in a community close to evacuation routes and hospitals, social resources are potentially facilitated by and are correlated to the distance of the community to the nearest main road. Therefore, this study uses the distance to the nearest main road to indicate the access of an area to social resources; that is, the longer the distance of an area to the nearest main road, the lower the level of accessibility of an area for evacuation and rescue during emergencies. Thus, people in those areas with long distance to the nearest main road will be in high risk of damaged, i.e., with high vulnerability.
The three aforementioned indicators can reflect the coping capacity of an area. These indicators are related to the capacity to cope with, resist, and respond to the effects when exposed to chemical hazards, significantly affecting human vulnerability. The GDP per capita is an indirect indicator, via affecting available social resources in contact with vulnerability. Hospital beds per 10,000 population and distance to the nearest main road are direct indicators with respect to vulnerability. The former reflect the rescue and health care providing abilities; the latter reflect the access to social resources and evacuation abilities. In addition, the negative relationship of GDP per capita and Hospital beds per 10,000 population with vulnerability is observed; that is, as the GDP per capita or Hospital beds per 10,000 population is increasing, the vulnerability is decreasing.
2.3. Genetic k-Means Clustering and Vulnerability Mapping
To obtain a precise vulnerability distribution in space, a 500 m × 500 m geographical grid is used as the basic spatial unit for mapping the vulnerability of Shanghai. Each grid cell is estimated by using the values of the five previously described indicators on human vulnerability. The indicators are then normalized as follows:
respectively represent the normalized and original values of the j
th indicator of the i
th grid, and xmin, j
and xmax, j
respectively represent the minimum and maximum values for the j
th indicator of all grid cells. Equation (3) is applied to the indicators of exposure, population density, and distance to the nearest main road, which show a positive relationship with vulnerability. Equation (4) is applied to the indicators of GDP per capita and hospital beds per 10,000 population, which show a negative relationship with vulnerability. Each normalized indicator ranges from 0 to 1, where 0 is the lowest contribution to human vulnerability and 1 is the highest contribution to human vulnerability.
Afterwards, the total grid cells of Shanghai are used for cluster analysis, which is performed in the five-dimensional data space spanned by indicators. In this paper, the clustering technique used is an improved k-means clustering that uses GA-generated initial cluster centers.
-means clustering aims to search for the solution of partitioning a data set into k
clusters. Data objects in the same cluster are similar to each other, and objects from distinct clusters are different from each other. This distribution minimizes the SSE of each data object from its cluster center. SSE is a commonly used criterion in measuring the quality of clustering. A lower SSE indicates better partition quality for partitions with the same k
. This criterion is defined as follows:
is the j
th object in cluster Ci
, and ci
is the center of cluster Ci
k-means clustering algorithm uses k-seed objects as initial k centers. This clustering algorithm consists of three basic operations performed iteratively, namely, data assignment to a cluster, centers (cluster mean vector) computation, and SSE convergence test. However, different initial centers may lead to different final cluster centers because this clustering algorithm converges to a local minimum. In this study, GA is used to obtain the initial centers for k-means clustering to identify reliably and efficiently high quality clustering solutions on the basis of the SSE criterion. The derivative-free optimization strategy, as a type of population-based evolutionary algorithm, allows GA to always yield a global optimum of the objective function.
The overall procedure of genetic k
-means clustering algorithm is shown in Figure 2
. The algorithm begins with the random initialization of a population and the calculation of the fitness values of the population. Each chromosome in the population denotes a set of k
cluster centers that use a real-number representation. The fitness function of the population is as follows:
The GA operators, which consist of selection, crossover, and mutation, are repeatedly conducted. The fitness values of the population are repeatedly evaluated until the fitness function becomes steady in the sense that its value of the best population does not change for several generations. In this case, GA is said to be converged. The best population provided by GA convergence will be close to the global minimum of the SSE. This best population is then inputted as the initial centers of the k-means clustering, thus obtaining the global optimum clustering solution.
Over flowchart of genetic k-means clustering.
Over flowchart of genetic k-means clustering.
To determine the optimal clustering number, we introduce the silhouette coefficient to work in combination with the SSE criterion because the SSE criterion is sensitive to the number of clusters, k
. The silhouette coefficient, a popular method of measuring the clustering quality, which combines both cohesion and separation [33
], is rather independent from the number of clusters, k
. For object i
, the silhouette coefficient is expressed as follows:
is the average distance of object i
to all other objects in its cluster; for object i
and any cluster not containing it, calculate the average distance of the object to all the objects in the given cluster, and bi
is the minimum of such values with respect to all clusters.
An overall measure of the goodness of clustering can be obtained by calculating the average silhouette coefficient of all objects. For one clustering with k
categories, the average silhouette coefficient of the cluster is taking the average of the silhouette coefficients of objects belonging to the clusters; that is:
is the total number of objects in the data set. The value of the silhouette coefficient can vary between –1 and 1. A higher value indicates better clustering quality.
In the study, we conduct genetic k-means clustering analysis of the study area under different k values. We plot the curves of the SSE and average silhouette coefficient against the number of clusters to analyze the two curves and to identify the optimal number of clusters, kopt. The clustering result with kopt as the number of clusters is then used to categorize the human vulnerability of the study area. We place the clustering result into space by using GIS to obtain the vulnerability mapping result.
2.5. Information Entropy Analysis and Vulnerability Evaluation
The concept of entropy was first introduced into the information theory by Shannon [34
]. In information theory, entropy is a measure of the disorder degree of a system. The larger values of entropy indicate more randomness and thus less information is expressed by data. It can measure the extent of useful information with data provided. Therefore, entropy is an objective means of defining the weights of vulnerability indicators based on the useful information in the available data.
For the study area, the ratio of value of the indicator j
in grid i
is defined as:
is the normalized values of the indicator j
of the grid i
is the total number of the grids in the study area.
Then, the information entropy of the indicator j
is expressed as:
Therefore, the importance of indicator j
extracted from the data set is calculated by:
is the number of the indicators.
We evaluate the vulnerability of each cluster of the study area by a weighted sum model of the indicators, using the importance of the indicator calculated by Equation (11) as the weight; that is:
is the mean value of the indicator j
of a cluster.