Social-Ecological Patterns of Soil Heavy Metals Based on a Self-Organizing Map (SOM): A Case Study in Beijing, China

The regional management of trace elements in soils requires understanding the interaction between the natural system and human socio-economic activities. In this study, a social-ecological patterns of heavy metals (SEPHM) approach was proposed to identify the heavy metal concentration patterns and processes in different ecoregions of Beijing (China) based on a self-organizing map (SOM). Potential ecological risk index (RI) values of Cr, Ni, Zn, Hg, Cu, As, Cd and Pb were calculated for 1,018 surface soil samples. These data were averaged in accordance with 253 communities and/or towns, and compared with demographic, agriculture structure, geomorphology, climate, land use/cover, and soil-forming parent material to discover the SEPHM. Multivariate statistical techniques were further applied to interpret the control factors of each SEPHM. SOM application clustered the 253 towns into nine groups on the map size of 12 × 7 plane (quantization error 1.809; topographic error, 0.0079). The distribution characteristics and Spearman rank correlation coefficients of RIs were strongly associated with the population density, vegetation index, industrial and mining land percent and road density. The RIs were relatively high in which towns in a highly urbanized area with large human population density exist, while low RIs occurred in mountainous and high vegetation cover areas. The resulting dataset identifies the SEPHM of Beijing and links the apparent results of RIs to driving factors, thus serving as an excellent data source to inform policy makers for legislative and land management actions.

Beijing agricultural soil using complex network theory in order to identify their diffusion evolutionary mechanisms. However, there is still a notable lack of the social-ecological patterns study to elucidate the underlying processes between the natural system and human socio-economic activities with heavy metals and the remediation methods and policies.
Therefore, the objectives of this study were to explore the potential of the SOM approach to identify the SEPHM in Beijing, and to propose individualized approaches to the management of the soil heavy metal pollution.

Study Sites
Beijing, with an estimated area of 16.4 thousand km 2 , is located in the northwestern part of China's north plain, generally between longitude 115°24'-117°30'E and latitude 39°38'-41°05'N. Its elevation slopes downward from 2,250 m in the northwest to 10 m in the southeast, and the mountainous area covers about 62% and plain 38% of the whole area ( Figure 1). The area has a temperate continental monsoonal climate with an annual average temperature of 11.8° (average maximum 26° in July and average minimum −5° in January). The annual average temperature difference is 30.4°, while the daily average temperature difference is 11.4°. Annual precipitation in this area is 470-660 mm, about 60% of which comes in July and August. Annual average evaporation is 1,800-2,000 mm. The area is the source of five big rivers, the Yongding, Chaobai, Beiyun, Jiyun and Daqing. Annual average runoff is about 1.8 × 10 9 m 3 , but had decreased to 1.3 × 10 9 m 3 by the end of the last century. The main soil types include drab soil, brown soil and skeleton soil in mountainous areas, and fluvo-aquic soil in plain areas. The population of the study areas was about 20.69 million, and the vehicle population reached 5.4 million in 2012. Heavy metals management is an important and complicated factor in the development of an ecological environment strategy in Beijing, because its environmental problems might represent the future of the other metropolis in China [37].

Sampling and Sample Processing
In this study, 1,018 soil samples were collected in Beijing in 2006 using an irregular stratified sampling technique based on the agricultural land distribution and land use type maps [4]. All the samples' geographical locations were recorded in the WGS84 geographical system in order to process the data into the Geographical Information System (GIS). More details of the soil sampling procedure can be found in the guidelines described in the monitoring protocol [4,5]. The metal concentrations were determined by the methods described in the Chinese Environmental Quality Standard for Soils [38]. The Cr, Ni, Cu, and Zn concentrations were analyzed by flame atomic absorption spectrophotometry after digestion in a mixture of HCl, HNO 3 , and HClO 4 . Pb and Cd were analyzed by graphite furnace atomic absorption spectrophotometry, and the As concentration was determined by potassium borohydride silver nitrate spectrophotometry. In addition, the Hg concentration was determined by cold atomic absorption spectrophotometry after digestion with a mixture of H 2 SO 4 , HNO 3 , and KMnO 4 . Quality assurance and quality control procedures were conducted using the standard reference material Geochemical Standard Soil.

Social-ecological Data
In China, communities and/or towns are the smallest administrative units and usually act as the basic unit for planning and management purposes [39]. Environmental problems have been especially significant for the development of Beijing as the capital of China. The classification of heavy metals social-ecological patterns at the town-level will provide useful information for the establishing sustainable environmental management strategies in Beijing. Therefore, the town-level scale was the basic unit in this study.
Numerous factors need to be considered for ecoregional characteristics clusters. It is a fundamental point to understand the correlations between ecological factors and heavy metals concentration. According to the previous literature investigations, the major inputs of trace elements to agricultural soils including atmospheric deposition, livestock manures, fertilizers and agrochemicals, sewage irrigation, sewage sludge, and some other sources [7,40]. Table 1 lists the contribution and the rank of each source to each individual heavy metals in soils. The complex sources of heavy metals are quantified and our classification datasets are also based on previous works on the spatial autoregression model [41] and risk grade assessment for heavy metals concentration of Beijing [5,27,42]. Additional geodata are from the Statistics yearbook, field surveys and spatial databases (Table 2, Figure 2).
Town-wise statistics metadata mainly consist of these parameters: human population density of each town (ind.·km −2 ), livestock (unit·km −2 ), fertilizers and agrochemicals input (t·km −2 ), land use cover data (km·km −2 or percent distribution), elevation (m), precipitation (mm). Among them, land use cover data has six categories including Normalized Difference Vegetation Index (NDVI), industrial and mining land (percent distribution), river (km·km −2 ), road (km·km −2 ), single cropped and double cropped land (percent distribution). We gathered detailed information on bovine, ovine, porcine and avian livestock at the town-wise level for the year 2007. Livestock unit (LSU) was calculated in a standardized manner [43]. Overall, they correspond to a total of 60,657 livestock units in the study area. Human population data and fertilizers and agrochemicals input data were obtained from the Statistics yearbook of Beijing. Precipitation data was supported by China Meteorological Data Sharing Service System [44], and the monthly average of precipitation from 2000 to 2010 were calculated.   Land use cover data obtained from Beijing Municipal Bureau of Land and Resources, while the cropping patterns were classified by the Moderate Resolution Imaging Spectroradiometer (MODIS) image [45]. All the related data were stored and managed in the Beijing agricultural resources and economic data system [46]. The eco-toxicity of heavy metals depends to a great degree on their bioavailability in soils and their toxicological factors and soil properties (e.g., soil organic matter, soil pH, mineral contents), when estimating their bio-availability to animals or human health [29]. As for the purpose of land use management, the procedures based on the total heavy metal contents in soils were employed in this study. A number of methods have been suggested to quantify the enrichment of heavy metals in contaminated soils, such as Contamination factor (CF), Enrichment factor (EF), Nemerow index (NI) ， Health risk index (HRI), Potential ecological risk index (RI) [47][48][49]. Among them, CF and RI are the typical representative pollution indexes, which have a wide range of application [5]. RI, also called the Hakanson potential ecological risk index, integrates the "toxic-response" factor and pollutant concentration of a given pollutant. E r i reflects the potential health risk in an ecosystem to a certain degree. The quantitative equation of the RI of a given pollutant was defined as follows: where T r i is the toxic-response factor for a given pollutant, and C r i is the contamination factor. C r i was calculated by the measured concentration of metal i in the sample divided by the reference value. In this study, environmental quality standard secondary grade for soils, a soil limitation to ensure agricultural production and human health was applied [38]. The T r i values of metals were as follows: Hg (40) > Cd (30) > As (10) > Cu (5) = Pb (5) = Ni (5) > Cr (2) > Zn (1) [5]. The following terminologies are used to describe risk levels: E r i < 40, low potential ecological risk; 40 ≤ E r i < 80, moderate potential ecological risk; 80 ≤ E r i < 160, considerable potential ecological risk; 160 ≤ E r i < 320, high potential ecological risk; and E r i > 320, very high ecological risk. To facilitate clustering 253 towns by SOM, the biophysical dataset (such as CF and RI of each elements) was interpolated into a grid by Kriging, and the average of these indices were resampled at a town-wise level to match the socio-economic data for the further utilization of SOM.

SOM Application and Statistical Analysis
A SOM algorithm also known as Kohonen Map or Self-Organizing Feature Map, is an unsupervised neural network based on competitive learning [50,51]. It projects high-dimensional input data onto a low dimensional (usually two-dimensional) space. The machine learning is accomplished by first choosing an output neuron that most closely matches the presented input pattern, then determining a neighborhood of excited neurons around the winner, and finally, updating all of the excited neurons [52]. This process iterates and fine tunes, and it is called self-organizing. The outcome weight vectors of the SOM nodes are allocated to have characteristic data patterns. The similar patterns based on k-means are combined with neighboring regions on the map, while dissimilar patterns are located further apart. An illustration of the flow chart of a SOM application is given in Figure 3. Detailed methodological aspects can be found in other computational papers [26,51,53]. In this study, the size of data used in training was 253 cases (i.e., unit towns) multiplied by 20 environmental parameters ( Table 2). Map size determination is one of the important features in any SOM application [53]. The optimum map size is selected based on minimum values for quantization error (QE) and topographic error (TE) [54,55]. QE exhibits the average distance between each data vector and its 'best matching unit' (BMU), and thus measures map resolution. TE represents the proportion of all data vectors for which 1st and 2nd BMUs are not adjacent, and is used for the measurement of topology preservation [56]. Moreover, QE and TE were adopted to adjust the obtained number of map units, therefore, to minimize errors in performance standard setting. Once the SOM had converged, the U-matrix and K-means algorithm were used in order to find clusters in the nodes of the SOM. To select the best patterning among partitions with different numbers of clusters, the Davies-Bouldin index (DBI) [57] was calculated. The smaller the DBI, the better the clustering. Calculations can be made by using the SOM Toolbox package for Matlab [58]. Cross competitive learning similar patterns are mapped onto neighboring regions on the map, while dissimilar patterns are located further apart (see e.g., [20,57] for further details).
SOM was used to classify the region. A statistical analysis method was employed to facilitate the understanding of the relationship between heavy metals risk index and social-ecological factors. Correlation coefficients were determined using Spearman's rank correlation test where p-values less than 0.05 were considered statistically significant [59]. Data were subjected to one-way ANOVA and Duncan's test was used for multiple comparisons among the zones. All statistical analysis was performed using the SPSS 17 statistical package. Figure 4 depicts the complete framework of the present study.

SOM Application and Clustering
The QE and TE are summarized at the different map sizes (from 40 to 198 map units) in Table 3. An 84-unit map (12 × 7) was selected with a quantization error of 1.809 and a topographic error of 0.0079, as it exhibited the smallest quantization error and topographic error values among the models. Figure 5 shows the U-matrix and cluster arrangements of nine clusters for the variables using the SOM model. U-matrix is the method for discriminating between the groups (nine clusters), and indicating the distances between the groups. The k-means also shows nine clear clusters based on the minimum DBI (DBI = 0.89) ( Table 4). The clusters defined by the U-matrix and k-means methods were consistent with each other. Thus, the communities were classified into nine groups (1-9) based on the U-matrix ( Figure 5).  Figure 5. The U matrix and cluster arrangements of nine clusters for the variables (i.e., 253 towns); each node on the U matrix describes the Euclidean distance between nodes in the SOM; therefore there is one node on the U matrix for every adjacent node on the SOM. Red U-matrix node indicates a large distance and blue nodes indicate a small distance. The distribution patterns of 20 input variables on the SOM plane are shown in Figure 6. The socio-geographical characteristics strongly affected the concentration of heavy metals in soil. The map unit in lower and right nodes showed higher scores for RI. Among these factors, PD, IML and ROD were similarly distributed to the RI. DEM, PREC and NDVI tended to be distributed in the upper left corner, which was relatively opposite to the RI. Agricultural inputs (LSU and FAI) have no significant contribution to the RI, and show higher values in the upper left corner. DC and SC showed a similar distribution, and tended to be related to the agricultural inputs. Soil-forming parent materials might influence the RI. However, this relationship is still unclear in this figure. SOM has the ability to express non-linear relationships, and the complexity in every variable was detected and included in the SOM maps.  Table 2.

Pattern of Heavy Metals Contamination
When considering the differences in factors among the nine clusters, variance analysis was employed and the results are shown in Table 5. For example, PREC, NDVI, LS were higher in cluster 1, while other land use cover PD, RID and ROD were lower in this pattern. DEM and NDVI were both higher in cluster 1, 3 and 4. Livestock Units (LSU), agricultural land (DC and SC), and soil Soil-forming parent material (CL) were higher in cluster 2, 5, and 6. Among the nine clusters, the demographic driver (PD), critical land use cover (IML and ROD), and state risk (RI) showed relatively high values in cluster 7, 8, and 9. These clusters indicate different interaction patterns between ecological systems for heavy metals contamination. For the spatial distribution of nine clusters on the GIS platform (Figure 7), cluster 1 mainly located in the Pinggu district, including parts of Miyun and Shunyi.   A dozen regional patches were found in cluster 2 in the city's peripheral suburbs. Cluster 3 and cluster 4 locate in the north and west quadrant of Beijing, respectively. Cluster 5 and cluster 6 were found in the vicinity of agricultural land in the east quadrant. Cluster 7, 8 and 9 were toward the city center area, relatively, where high eco-risk areas were observed.   Table 6 illustrates the soil samples variations in the amounts of Ni, Cr, Cu, Zn, As, Cd, Pb, and Hg in Beijing under different clusters. The ANOVA results indicated that most of clusters had significant differences in the accumulation of heavy metals in Beijing. Most of the elements had the highest contents in cluster 7, except for Cr and As. Simultaneously, relatively high contents of all elements were found in cluster 9. Cluster 7 and cluster 9 are located in zone III, where there is mainly urban land. Ni, Cr, and Zn were relatively higher in cluster 4, which consisted of the communities located in higher DEM, precipitation and Normalized Difference Vegetation Index (NDVI). The distribution of As was homogeneous in each cluster, which might indicate that the source of As is from the natural indigenous soil minerals in Beijing. The contents of Cu and Zn are not greater in cluster 2, 5, and 6 (zone II) than the cluster 7 and 9 (zone III), suggesting that fertilizers and agricultural input had less impact than the ritual casting of bronze which started over 4,000 years ago and automobile tire wear in modern Beijing. For Cd, Pb and Hg, the classification results provided the best outcomes. The distribution of these elements was separated into two dimensions: a higher concentration in clusters 7 and 9 (urbanized areas), while a lower concentration in the others.

Relationships between Soil Heavy Metals Risk and Environmental Variables
Correlation analysis was performed to investigate the relationships between environmental variables and soil heavy metals among the nine clusters. Most comparison cases in the Spearman correlation matrix showed a clear statistical significance when community characterizations were conducted (Table 7). Table 7. Spearman rank correlation coefficient of variables and heavy metal RI to the town-wise data (n = 253) of nine clusters. Different relationships were detected among the nine clusters (i.e., NDVI is strongly negatively related to the soil heavy metal risk in clusters 5, 6 and 8, while has no significant correlation coefficients in the others). Many more factors have significantly correlated coefficients (p < 0.01) in clusters 5, 6, and 8, indicating that the function of soil heavy metals risk is complicated in these clusters. In contrast, the structure and function are relatively simple in clusters 2, 7, and 9.
As for concrete environmental variables, the increased population density also caused soil heavy metals risk in clusters 3, 5 and 6 located at the suburban vicinity. Livestock and IML were positively related with RI in cluster 1 and cluster 5. Among the factors, most of them were related to the risk of heavy metals in Beijing. However, the area of double cropping, loam and clay in communities were not significantly correlated with RI.

Discussion
Ecology is not able to quantify all the relations between exposure and effects of contaminants such as heavy metals, because of the complexity of the sources and their interactions with human well-being. Thus, ecologists and environmental managers endeavor to find environmental indicators which can represent parts of these complex environmental relationships. In recent decades, there have been a series of classifications of different subjects, such as aquatic ecosystem classifications (AEC), biodiversity conservation classification and forest service classification. Social-ecological patterns represent the comprehensive interaction of substance and energy, which form the live organisms and the environment. Thus, difference-oriented policies should be made based on the local ecosystems, which share a number of basic structural and functional characteristics [10,13]. In this study, SOM was performed to classify the region of Beijing into nine clusters at the town-wise level, and a number of the major factors that control heavy metals contamination were identified in each cluster.
SOM have been proven to be a promising tool for describing the evolution of metal accumulations in terrestrial ecosystems. The SOM projection shifts the complicated structure from high dimensional arrays into the lower dimensional clusters based on the neighborhood relations, which is important to ecological classifications for environmental management. For instance, the spatial generalization of environmental data are measured at single sites as well as in dynamic global change modeling in land use or environmental planning [11,60,61]. Land classifications provide spatial reference systems that may indicate long-term effects (responses), for example, structural and functional, resulting from the bioaccumulation of contaminants. Such effects depend both on the stress intensity and on the ecological characteristics and related sensitivities of the land unit which is exposed [10].
From the results of this study, it can be concluded that the heavy metals contamination patterns are strongly related to human populations, IML, and ROD distribution. Additionally, vegetation and geographical aspects of the land also affected heavy metals concentration. We could hypothesize the sequential pattern as the following: (1) high heavy metals risk areas locate in urbanized areas that exhibit high population and low elevation, which are suitable for increased road and IML density; (2) these areas, are more frequently invaded by vehicular traffic volumes and urbanization; (3) therefore, land use management on community units (at a town level) is needed in the more populated areas in order to reduce the heavy metals risk.
Results from the ANOVA analysis of the element contents show differences among the clusters indicating that heavy element contents in clusters 7, 8 and 9 (the urban area) were higher than the other clusters. Cu, Zn, Cd, Pb and Hg in urban soils usually come from gasoline, car components, oil lubricants, and industrial and incinerator emissions [6,34,[62][63][64]. Although leaded gasoline has been banned in Beijing since 1997, the impact on this area may last for the coming years [6], and also decoration waste deposition from the repairs of time-honored parks was considered [29]. The source of Cd in this area may be from coal combustion, solid wastes such as plastics and automobile tires [29,34]. The casting of bronze ritual figures and the automobile tire wear were considered to be the main sources of Cu and Zn. Hg mainly came from the atmospheric deposition along the roadside. Therefore, traffic volume regulations and air cleaning plans should be implemented in these areas. Clusters 2, 5 and 6 are mainly located in the agricultural areas and are far away from the central city, while Cr, Cu, Zn, Hg exhibited ascending trends among them. These may illustrate that the sources were both from industrial emissions and agrochemical inputs [8,31,33]. Therefore, green agriculture is suitable in these communities, which should decrease the use of the Cd-containing, Cu-containing agricultural material inputs from agricultural activities. Heavy metals in clusters 1, 3 and 4 showed higher level elevation, vegetation index, precipitation and parent materials diversity than other clusters. Restoration management was needed in these areas, which was consistent with the report of the Beijing Government [65].
To meet the land-use management needs, besides the identification of metal accumulation spatial patterns, the consecutive spatial patterns are also an important component of zoning [66]. Integrating the fragmentary zones benefits soil trace elements management at the generalized scale. According to the analysis in Section 4.2, in the present study, the main soil heavy metals contamination patterns could be drawn in three zones. Figure 8A illustrates the spatial distribution of communities in Beijing in accordance with clustering results analysis and SOM model. Zone I was located in the mostly mountainous areas and had relatively high vegetation cover in the west and northwest of Beijing (Figure 2A,B). Zone II consisted of the communities located in the peripheral suburbs, whose predominant land use type was agricultural land ( Figure 2B). Furthermore, Yanqing was also detected and grouped into zone II, where was used to be a large state farm. Zone III Ⅲ was formed with the remaining area, and located inside the 6th ring, and this contains the most recently urbanized area of Beijing ( Figure 2B-D). Regarding the clusters, clusters 1, 3 and 4 were mainly contained in zone I, clusters 2, 5, and 6 were attached to zone II, and the remainder were located in zone III. In order to prove the cluster results we proposed, the current development plan of Beijing was employed for analysis.
In China, the concept of major function oriented zone (MOFZ) was proposed to achieve coordinated regional development and environmental protection based on the territorial functions [67,68]. More recently, the MOFZ of Beijing municipality was released [69] and it clearly stated four functionoriented zones (the capital function core area, the urban function extended district, the new districts of urban development and the ecological preservation districts, see Figure 8B). However, the four districts may not be satisfactory for concrete environmental management policy formulations. The nine zones proposed in the present study were clustered by a bottom-up approach based on SOM. Although the MOFZ was made by political or legal means, the integrated zones in accordance with the SOM and multivariate statistical techniques were consistent with the MOFZ of Beijing, indicating that the classification in this literature was reasonable and suited for the actual conditions of the study area. However, a slight difference could be found between Figure 8A,B. The former zones were not limited by the administrative boundary lines. For example, the propagation pathway of zone III was similar to the new districts of urban development in the downwind southeast quadrant of the city. Moreover, the direction of expansion of the Changping and Fangshan districts should be considered by government decision makers.

Conclusions
Ecoregional synthesis and management classification are the key issues of agricultural soil heavy metal environmental monitoring and management, linking the metals accumulation at individual soil sites with the social-ecological dataset of the area managed [10,13,14]. A total of 1,018 soil samples were investigated in order to conduct a comprehensive monitoring and management of the soil heavy metals in Beijing. Social-ecological datasets were developed according to the possible potential sources of the heavy metals. Self-organizing map (SOM) and multivariate statistics were employed to cluster the data into nine habit types which could reveal the homogeneity and regularity of heavy metal migration. It can be used to further soil environment management and land use planning at a province level.
The main outcomes of the study can be drawn as follows: (i) SOM was verified to be a promising approach for pattern recognition and, in particular, for delineating social-ecological patterns of soil heavy metals; (ii) the main factors that influence the heavy metal concentration in Beijing were associated with the population density, vegetation index, industrial and mining land percent and road density-this is useful information for reference in future research; (iii) the social-ecological patterns of heavy metals in Beijing were detected in nine clusters and mainly categorized into three zones. The results are critical for improving the efficiency of the soil heavy metals management; (iv) classification of social-ecological patterns on soil heavy metals (SEPHM) provides a great deal of information enhancing the risk status source identification at the community scale, although the temporal spatial patterns were merely considered in this study. The spatialization of other factors such as the atmospheric deposition, sewage irrigation and sewage sludge may modify the patterns in a greater refinement. Designing a theoretical framework for combining these perspectives will be an exciting open problem for future analysis.