In addition, according to the relevant provisions of the 
Law of the People’s Republic of China on Prevention and Control of Water Pollution [
20], risk sources uncorrelated with water supply and water source protection, such as sewage outlets, direct discharge enterprises and loading and unloading wharfs, are prohibited within primary and secondary water source protection areas. Therefore, if the above-mentioned risk sources appear in the primary and secondary protection areas of water sources, the risk source distribution zones within the areas will be partitioned as separate sub-regions. Moreover, all of these sub-regions are the key risk supervision sub-regions irrespective of their risk indexes.
  3.1. Minimum Spanning Tree (MST)-Based Method for Partitioning Risk Source Concentrated Sub-Regions
When the risk sources are regarded as the nodes and the lengths of lines linking them are regarded as the weighs, all of the lines linking risk sources constitute a weight map. If we want to connect all the nodes in the weight map with the least number of links, the links selected will have to form a tree [
21]. By comparing a certain distance threshold to the distance between two adjacent risk sources on the spanning tree (i.e., weight of the two risk sources), we can determine whether these two risk sources are within the same sub-region. Among all spanning trees, the one with the minimum weighted sum is the minimum spanning tree [
21]. Therefore, it is most accurate to determine whether the adjacent risk sources are within the same sub-region based on the distance between adjacent risk sources on the minimum spanning tree.
If a risk source and its adjacent risk sources are within the same sub-region, the distance between it and its adjacent risk sources should be within a certain threshold; and the minimum distances between risk sources within this sub-region and risk sources within other sub-regions should all be greater than this threshold. A typical example is the risk source 
f4 as shown in 
Figure 3a. Its distances from the risk sources 
f5 and 
f6 are 1.14 km and 1.62 km, respectively (
Table 1), while its distance from the risk source 
f2 is 2.74 km (
Table 1). Assuming a threshold of 2 km, then the risk sources 
f4–
f6 should be in the same sub-region, whereas the risk source 
f2 is in another sub-region. In addition, there is also the case where the distances between a certain risk source and its adjacent risk sources do not differ much, but some of the distances are slightly greater than the threshold. Various risk sources involved in such case can also be partitioned into one sub-region. A typical example is the risk source 
f2 as shown in 
Figure 3a. Its distances from the risk sources 
f1 and 
f3 are 1.89 km and 2.16 km, respectively (
Table 1). If only compared to the threshold of 2 km, the risk source 
f3 should not be partitioned into the same sub-region with sources 
f1–
f2. However, since 1.89 km and 2.16 km are not much different, and 2.16 km is only slightly larger than 2 km, we can also consider that the risk sources 
f1–
f3 are in the sub-region in this case. Therefore, based on the above analysis, we can determine the sub-region partitioning conditions on the basis of constructing the minimum spanning tree of risk sources by comprehensively considering two aspects: distances between risk source and its adjacent risk sources; and deviation between the distances.
At present, there are already some classic minimum spanning tree algorithms, such as the Kruskal method [
22] and the Prim method [
23]. In this paper, the Prim algorithm [
23] is employed to construct the minimum spanning tree of risk sources. The specific steps are described below. The route between adjacent risk sources on the risk source minimum spanning tree was defined as the node path, whose length was the node distance. With the node distance on the entire risk source minimum spanning tree as the object, if partitioning conditions were satisfied, the corresponding node path would be interrupted. Accordingly, the original single risk source minimum spanning tree would be divided into several minimum spanning trees. Risk sources connected in series by each resulting minimum spanning tree was partitioned into one sub-region.
  3.1.1. Construction of Risk Source Minimum Spanning Tree
Prim algorithm [
23] was used to construct the risk source minimum spanning tree by assuming there were 
n number of risk sources in the survey area around water source, and the risk source set was 
F = {
f1, 
f2, … , 
fn}. The specific steps were as follows:
          
- (1)
- Risk source minimum spanning tree connected the risk sources within the entire survey area in series gradually. The risk source set that had been serially connected by the minimum spanning tree was set as V; the risk source set yet to be serially connected by the minimum spanning tree was set as U; and the node path set on the minimum spanning tree was set as L. Initially, V = { }, U = F, L = { }. 
- (2)
- fi was added to V starting from any risk source fi. 
- (3)
- Risk source fj nearest to all risk sources in V was found from U, and the node path of fj serially connected by minimum spanning tree was added to L. 
- (4)
- fj was added to V, U = F − V. 
- (5)
- Steps (3) and (4) were repeated until U = { }. 
Specific calculation formula for risk source spacing was as follows:
          
          where 
fix and 
fiy are the 
x and 
y coordinates of risk source 
fi; and 
fjx and 
fjy are the 
x and 
y coordinates of risk source 
fj, respectively.
  3.1.2. Judgment Criteria for Sub-Region Partition
From the perspective of the entire survey area, whether two risk sources could be partitioned into the same sub-region depended on the contrast of distance between them to the entire survey area size. If the distance between the two was very small relative to the entire survey area, the two risk sources could be partitioned together. Otherwise, the two risk sources should be subordinate to different sub-regions. Thus, when determining the partitioning distance threshold dmin for judging whether two risk sources were within the same sub-region, the size of the entire survey area should be taken into account.
Risk sources posing threat to water sources were generally distributed along the shoreline. Overall, the spatial size of survey area could be represented by the unilateral shoreline length 
S of the river in the survey area where the water source was located. Ratio of 
dmin to 
S was set as 0.1, then:
          
According to the provision of the Ministry of Environmental Protection’s 
Guidelines for Protection of Centralized Drinking Water Source Environment [
24], the risk source survey area around water sources covers a 20 km range of upstream of secondary water source protection area. In this paper, 
S was set as 20 km, so 
dmin was 2 km.
In addition, if the ratio di/dj of node distances di and dj corresponding to two connected node paths Li and Lj was within a certain range, the ratio range could be set between [0.83, 1.2]. When di/dj ∈ [0.83, 1.2], and one of the node distances was less than or equal to dmin, the three risk sources connected in series by Li and Lj were partitioned into one sub-region.
Taking into comprehensive consideration the distance between nodes and the deviation between distances of their interconnected nodes, the judgment criteria for sub-region partition were set as follows:
          
          where 
di is the node distance of node path 
Li; 
Lj is the node path with the largest node distance from 
Li, and 
dj is its node distance.
  3.1.3. Sub-Region Partitioning Procedure
- (1)
- Prim algorithm [ 23- ] was used to construct the risk source minimum spanning tree by assuming there were  n-  number of risk sources in the survey area around water sources. Node path set of the minimum spanning tree was defined as  L-  = { L1- ,  L2- , …,  Ln−1- }, while the node distance set corresponding to node paths was defined as  D-  = { d1- ,  d2- , …,  dn−- }. 
- (2)
- Node distance di was determined one by one from large to small for whether it satisfied the Equation (3). If satisfied, the node path Li would be interrupted. Accordingly, the minimum spanning tree where Li was located was decomposed into two minimum spanning trees. 
- (3)
- Risk sources connected in series by the resulting minimum spanning trees were partitioned into one sub-region. 
  3.1.4. Example of Sub-Region Partitioning
Taking the risk sources 
f1–
f6 in 
Figure 3a as an example, the risk source minimum spanning tree was constructed from the risk source 
f1. The construction process and results are shown in 
Figure 3 and 
Table 1, whereas the minimum spanning tree of risk source constructed is shown in 
Figure 4.
By comparing the various path distances in 
Table 1 to the distance threshold 2 km, it was observed that the paths 
L2 and 
L3 were both greater than the distance threshold. However, the nodal distance ratio of paths 
L2 to 
L1 (node path with the largest nodal distance from 
L2) was 1.14, while the nodal distance of L1 was 1.89, which was less than the distance threshold. Thus, 
L2 did not satisfy Equation (3), and only path 
L3 satisfied the interrupt condition. Through interrupting path 
L3, two sub-regions could be obtained, of which risk sources 
f1–
f3 were partitioned into a sub-region A, and risk sources 
f4–
f6 were partitioned into another sub-region B. The specific partitioning results are shown in 
Figure 4.
  3.2. Method for Determining Sub-Regional Risk Indexes
Determination of SrRI should be based on the RIRS of individual risk sources within the sub-regions. Without considering other influencing factors, the greater the mean value about risk indexes of risk sources (Mean-RIRS) in sub-regions, the greater the value of SrRI theoretically. In addition, considering that risk sources with larger RIRS had far greater risk impact on the sub-regions than those with smaller RIRS, the maximal value about risk indexes of risk sources (Max-RIRS) in sub-regions should also be taken into account in determining SrRI. However, consideration of the effects of RIRS for various risk sources in the sub-regions only was unable to accurately reflect SrRI. If RIRS for individual risk sources in the sub-regions were the same, the larger the number of risk sources in the sub-regions, the greater the sum of RIRS, the higher the risk threats to the surrounding waters, and the greater the SrRI theoretically. Therefore, besides RIRS for various risk sources in the sub-regions, the influences of risk source number on SrRI, and the intensive industrial distribution and close distance between risk sources in the economically developed areas need to be considered. Hence, in case of fire, explosion or other accidents at a risk source, sequential fire or explosion at multiple risk sources may be triggered (such as the Tianjin Port explosion that occurred in August 2015). Compared to the fire or explosion accidents at a single risk source, sequential fire or explosion at multiple risk sources will greatly increase the probability of water pollution accidents. Consequently, when determining SrRI, the concentration degree of risk source distribution in sub-regions should also be considered.
Based on the above analysis, determination of sub-regional risk indexes needs to take into account factors such as Mean-RIRS, Max-RIRS, number of risk sources and degree of risk concentration within a sub-region. The specific calculation formula was as follows:
        
        where 
Kt is the risk index of the 
t-th sub-region; and 
m is the number of risk sources in the 
t-th sub-region. 
 and 
kti are the average risk index of risk sources and the risk index of 
i-th risk source in the 
t-th sub-region, respectively. They reflected the risk scale of individual risk sources within a sub-region. 
kmax is the maximum value that the risk index of individual risk sources could assume. 
Gt is the risk distribution density index (RDDI) of the 
t-th sub-region, which reflected the relative degree of risk concentration in the 
t-th sub-region. 
Dt is the risk source quantity index (RSQI) of the 
t-th sub-region, which reflected the risk increment in the 
t-th sub-region caused by increased number of risk sources.
  3.2.1. Method for Determining RIRS
RIRS should be determined by comprehensively considering influential factors such as industry type, production scale, technological level, wastewater complexity and risk supervision and emergency response capability [
25]. However, most of the above factors can only be qualitatively described for their scale of risk threats to the surrounding areas except for the production scale (industrial source can be represented by wastewater discharge capacity, and wharf source can be represented by berthing capacity). Hence, it can hardly be quantitatively determined. In contrast, the semi-quantitative comprehensive index method is based on the calculation of a comprehensive evaluation index that summarizes the indexes of multiple risk elements using weight values [
15]. This semi-quantitative method has been widely used to assess the risk of chemical and petrochemical areas [
18] and mining areas [
26]. Therefore, in this paper, RIRS can be assessed employing the comprehensive index method. The specific assessment steps were as follows: (1) Each assessment index of individual risk sources was graded according to relevant risk grading criteria and scored. The risk levels were classified into four categories: very low, low, medium and high, which corresponded to fours scores: 1, 2, 3 and 4, respectively; (2) RIRS were determined by weighted summation based on the weight and score values of indexes. Since the sum of index weights was 1, and the maximum score for each index was 4, the maximum value reachable by RIRS was 4, i.e., 
kmax = 4. The risk grading criteria and specific weights of indexes are shown in 
Table 2 [
25]. Risk levels of risk sources were determined according to the RIRS based on 
Table 2 [
25]. 
Table 2 could also be used as the sub-regional risk grading criteria. 
  3.2.2. Method for Determining Sub-Regional RDDI
RDDI of multiple risk sources within sub-region was determined based on the relative degree of risk concentration between adjacent risk sources on the risk source minimum spanning tree.
Relative degree of risk concentration between adjacent risk sources connected by the 
i-th node path 
Lti on the risk source minimum spanning tree in the 
t-th sub-region could be determined by comparing the node distance 
dti corresponding to 
Lti with the risk distance threshold 
dmax. The specific formula was as follows:
          
          where 
wti is the index reflecting the relative degree of risk concentration between adjacent risk sources connected by the node path 
Lti, and 
wti ∈ (0,1]. When 
wti was equal to 1, it indicated that the two risk sources were completely concentrated; otherwise, it indicated relative concentration. The closer the value of 
wti to 0 was, the lower the degree of concentration between two risk sources would be.
Risk distance threshold 
dmax was determined based on the length of unilateral river shoreline 
S reflecting the spatial size of survey area. The specific formula was as follows:
          
Ministry of Environmental Protection’s 
Guidelines for Protection of Centralized Drinking Water Source Environment [
27] provides that the risk source survey area around water sources covers a 20 km range of upstream of secondary water source protection area. 
S was considered as 20 km, so 
dmax was considered as 1 km.
The RDDI of the 
t-th sub-region was calculated as follows:
          
          where 
Gt is the RDDI of the 
t-th sub-region, and 
Gt∈(0, 1]. The closer the value of 
Gt to 0, the more dispersed the risks in the 
t-th sub-region, and vice versa. 
 and 
 denote the risk indexes of risk sources at the upper and lower ends of node path 
Lti, respectively. Meanings of 
m and 
wti are the same as above.
  3.2.3. Method for Determining Sub-Regional RSQI
If the impact of risk source risk indexes was not considered, the more the number of risk sources within sub-region, the greater the water environmental risk in the sub-region. RSQI was determined by assuming that the increment of risk source quantity in the 
t-th sub-region was linearly related to the increment of sub-regional risk. Specific formula was shown below:
          
          where 
Dt and 
m have the same meanings as above.
  3.3. Method for Determining Water Source Perimeter Key Risk Supervision Sub-Regions
Determination basis of key risk supervision sub-regions around water sources was the sub-region’s scale of risk threats to the water sources. In contrast, the aforementioned sub-regional risk index characterized the sub-region’s scale of risk threats to its surrounding water environment sensitive receptors. The larger the SrRI, the severer the substandard condition of water for sensitive waters after pollution accidents, and the greater the risk threats to these sensitive waters. In this case, the receptor should be close to the sub-region. If the distance between the two increased, the sub-region’s risk threats to the sensitive receptor should be reduced. Similarly, whether the sub-region and sensitive receptor were on the same shoreline; and whether the sub-region was located upstream of sensitive receptor would also impact the sub-region’s scale of threats to the receptor.
Based on the above analysis, SrTWS was determined in this paper via the SrRI and the positional relationship between sub-region and water source. The positional relationship between sub-region and water source includes: distance between the sub-region and the water source in the direction of water body’s forward flow, i.e., the x directional distance; whether the sub-region and the water source were on the same shoreline, if not, distance between the two in the vertical direction of water body’s forward flow, i.e., the y directional distance; and in the case of reciprocating flow of water body, the upstream–downstream positional relationship between the sub-region and the water source.
Sub-regional risk supervision level was determined according to SrTWS, thereby identifying the key risk supervision sub-regions around water sources. Risk supervision grading criteria were established based on the risk grading criteria for risk sources [
25]. The details are shown in 
Table 3.
Specific calculation formula for SrTWS was as follows:
        
        where 
WGt is the 
t-th sub-region’s SrTWS; and 
Xt and 
Yt, are the adjustment coefficients determined by considering the magnitudes of distances between the 
t-th sub-region and the water source in the 
x and 
y directions, respectively. When the sub-region and the water source were located on the same shoreline, 
Yt was 1. 
Pt was the adjustment coefficient considering the upstream–downstream positional relationship between the 
t-th sub-region and the water source. 
Gt had the same meaning as above.
According to the provision of the Ministry of Environmental Protection’s 
Guidelines for Protection of Centralized Drinking Water Source Environment [
24], the risk source survey area around water sources covers a 20 km range of upstream of secondary water source protection area. In this paper, SrTWS was considered equal to the sub-regional risk index when the 
x directional distance between sub-region and secondary water source protection area was within 10 km irrespective of other factors. When the 
x directional distance between sub-region and secondary water source protection area was greater than 10 km, SrTWS decreased linearly with the increasing distance. 
Xt was determined according to this principle, and its calculation formulas were:
        
        where 
Stx is the integrated distance (km) from multiple risk sources within the 
t-th sub-region to water intake in the 
x direction. If the 
t-th sub-region was located upstream of the water source, 
Stx > 0; otherwise, 
Stx < 0. 
LW was the distance (km) from source water intake to the upstream and downstream secondary protection area boundary. When the sub-region was located downstream of the water source, the value of 
LW was the distance from water intake to downstream secondary protection area boundary. 
m was the number of risk sources within the 
t-th sub-region. 
stix, 
kti were the 
x directional distance (km) from 
i-th risk source within 
t-th sub-region to water intake and the risk index of 
i-th risk source, respectively. When the risk source was located upstream of the water source, the value of 
stix was positive; otherwise, the value was negative. 
Xt had the same meaning as above.
As the lateral flow of the river was small, the sub-regions located on the other shore of water sources had less threat on the water sources. Similarly, reflected in the setting of water source protection areas, the width of protection areas should be significantly less than their length. By referring to the method of determining 
Xt coefficient, 
Yt in the case where sub-region and water source were not on the same shoreline was determined with protection area width as the standard. The specific calculation formulas were as follows:
        
        where 
Sty and 
WIp were the 
y directional integrated distance (km) from the 
t-th sub-region to the shoreline of water source location and the width (km) of secondary water source protection area at the source water intake; respectively. 
stiy was the 
y directional distance (km) from 
i-th risk source within 
t-th sub-region to water intake. 
Yt and 
kti had the same meanings as above.
For reciprocating flow, the sub-regions located downstream of water sources only posed risk threats on the water sources through reverse flow, whose risk threats were smaller compared to the sub-regions upstream of water sources. The influence of the upstream–downstream positional relationship between sub-regions and water sources on the SrTWS could be reflected by the forward and reverse flow durations of water sources. The specific formula was as follows:
        
        where 
TF and 
TO are the forward and reverse flow durations of water source, respectively; and 
Pt has the same meaning as above.