Improved set pair analysis and its application to environmental impact evaluation of dam break

: Despite the rapid development of risk analysis in dam engineering, there is a relative absence of research on the environmental impact of dam break. As a systematic theory, set pair analysis has a good e ﬀ ect in dealing with uncertainties, although the result is relatively rough and easy to distort. A connection degree of ﬁve grades and a generalized set of potential are introduced to improve traditional set pair analysis. Combined with the index system, the evaluation model of the environmental impact of dam break is established, which is based on generalized set pair analysis. Taking Sheheji Reservoir dam as an example, a comparison of evaluation results of fuzzy mathematics theory and generalized set pair analysis is made, which veriﬁes the scientiﬁcity and practicability of the method proposed in this paper. The results show that the evaluation grade of the environmental impact of dam break at Sheheji Reservoir is serious, and appropriate management measures should be taken to reduce the risk.


Introduction
Dams play an extremely important role in flood control, power generation, shipping, water supply, mitigating the world's energy crisis and reducing the risk of global climate change [1]. However, due to the storage of water on such a scale, huge potential energy is generated and poses a great threat to the downstream [2]. Dam break produces a large number of destructive floods [3][4][5][6], results in loss of life and economy [7][8][9], and causes enormous negative impact on the environment and ecosystem. In recent years, there have been many serious dam break accidents. As of 24 July 2018, at least twenty people had been killed and more than one hundred had gone missing in flood caused by the collapse of an under-construction dam, which is part of the Xe-Pian Xe-Namnoy hydroelectric power project in southeast Laos [10]. Twenty people were killed and eight went missing because of dam break at Sheyuegou Reservoir on 1 August 1 2018 in Xinjiang, China [11]. More than 188,000 people were evacuated due to the threat of flooding caused by local damage to Northern California's Oroville Dam in July 2017. The dam has two spillways to release water out of the lake to prevent overflow, but both had problems [12].
Compared with the concern surrounding loss of life and economy [13,14], the attention paid to environmental impact caused by dam break grew relatively later. Some scholars have carried out research recently. Wang et al. (2006) quantified the influencing factors of the environmental impact of dam break by following expert experience, and integrated them into environmental impact indices [15]. He et al. (2008) presented an index system for the evaluation of the social and environmental impact of dam break [16]. Zhang et al. (2014) established a comprehensive risk assessment system of dam flood overtopping based on the synthetization of probability of dam failure and corresponding loss [17]. Cleary et al. (2015) used several failure scenarios to predict consequences in terms of downstream inundation and damage [18]. Li et al. (2015) proposed a dam risk comprehensive evaluation model based on set pair analysis, combined with the existing laws and regulations of dam failure accident classification rules [19]. Latrubesse et al. (2017) introduced a dam environmental vulnerability index to quantify the current and potential impact of a dam in a basin [20].  specified an evaluation index system of the consequences of dam break on the basis of analysis of hazards, exposure and vulnerability [21]. In addition, Li et al. (2019) established a new coupling evaluation model combined with set pair analysis and variable fuzzy set theory [22]. Magilligan et al. (2019) used detailed field sampling and systematic image analysis to document the immediate and sustained geomorphic adjustments at four failed dams within the urbanized Gills Creek watershed [23]. These methods have made great progress in the analysis of environmental impact factors of dam break. However, there are still some obvious shortcomings due to the characteristics of the methods, such as the evaluation values being too close and the evaluation grades not being clear.
Set pair analysis, which deals with uncertain problems from internal relations of the system, has been applied to the risk evaluation of dam break. Nevertheless, traditional rough set pair analysis has the problem of contradictory comments and distortion of the evaluation results [22]. Therefore, set pair analysis is here introduced and improved to evaluate the environmental impact of dam break.

Traditional Set Pair Analysis
Set pair analysis was first proposed by the Chinese scholar Zhao Keqin in 1989 [24]. Its core idea is to establish two sets, which have certain connections, for the problem to be studied. The terms identity, difference and opposite are used to describe the characteristics of the two sets of A and B, as shown in Figure 1.
Water 2019, 11, x FOR PEER REVIEW 2 of 12 of dam break by following expert experience, and integrated them into environmental impact indices [15]. He et al. (2008) presented an index system for the evaluation of the social and environmental impact of dam break [16]. Zhang et al. (2014) established a comprehensive risk assessment system of dam flood overtopping based on the synthetization of probability of dam failure and corresponding loss [17]. Cleary et al. (2015) used several failure scenarios to predict consequences in terms of downstream inundation and damage [18]. Li et al. (2015) proposed a dam risk comprehensive evaluation model based on set pair analysis, combined with the existing laws and regulations of dam failure accident classification rules [19]. Latrubesse et al. (2017) introduced a dam environmental vulnerability index to quantify the current and potential impact of a dam in a basin [20].  specified an evaluation index system of the consequences of dam break on the basis of analysis of hazards, exposure and vulnerability [21]. In addition, Li [23]. These methods have made great progress in the analysis of environmental impact factors of dam break. However, there are still some obvious shortcomings due to the characteristics of the methods, such as the evaluation values being too close and the evaluation grades not being clear. Set pair analysis, which deals with uncertain problems from internal relations of the system, has been applied to the risk evaluation of dam break. Nevertheless, traditional rough set pair analysis has the problem of contradictory comments and distortion of the evaluation results [22]. Therefore, set pair analysis is here introduced and improved to evaluate the environmental impact of dam break.

Traditional Set Pair Analysis
Set pair analysis was first proposed by the Chinese scholar Zhao Keqin in 1989 [24]. Its core idea is to establish two sets, which have certain connections, for the problem to be studied. The terms identity, difference and opposite are used to describe the characteristics of the two sets of A and B, as shown in Figure 1.

Opposite
Opposite Identity Difference Difference A B There are N features of set pair H= (A, B) composed of set A and set B, in which S features are shared by set A and set B, P features are opposite, and the others (F) are difference. Connection degree of the two sets is used to describe uncertainty of the system quantitatively, as expressed in Equation where μ is connection degree; a, b and c are weights of connection degree, corresponding to identity, difference and opposite respectively; a, b, c∈ [0,1], a+b+c=1, a and c are certain and b is uncertain relatively; i symbolizes difference or corresponding coefficient, i∈ [0,1]; j symbolizes opposite or corresponding coefficient, j=−1.
As a systematic method for dealing with uncertain problems, set pair analysis has been applied in many fields and has achieved good results. Su   There are N features of set pair H = (A, B) composed of set A and set B, in which S features are shared by set A and set B, P features are opposite, and the others (F) are difference. Connection degree of the two sets is used to describe uncertainty of the system quantitatively, as expressed in Equation (1).
where µ is connection degree; a, b and c are weights of connection degree, corresponding to identity, difference and opposite respectively; a, b, c∈ [0,1], a + b + c = 1, a and c are certain and b is uncertain relatively; i symbolizes difference or corresponding coefficient, i∈ [0,1]; j symbolizes opposite or corresponding coefficient, j = −1.
As a systematic method for dealing with uncertain problems, set pair analysis has been applied in many fields and has achieved good results. Su et al. (2009) assessed the urban ecosystem health level based on set pair analysis, by which the approximate degree of the real index set to the optimal one was defined and evaluated to describe the relative health state of the concerned urban ecosystems [25]. Hu et al. (2011) proposed a method based on cumulative prospect theory and set pair analysis for dynamic stochastic multi-criteria decision-making problems [26]. Li et al. (2011) introduced set pair analysis for groundwater quality assessment and assigned entropy weight to each index to improve the assessment model [27]. Wang et al. (2017) applied set pair analysis to the risk assessment of water inrush [28]. Cao et al. (2018) proposed an approach to interval-valued intuitionistic stochastic multi-criteria decision-making (MCDM) problems using set pair analysis [29]. Lin et al. (2018) proposed an advance optimized classification method to predict the surrounding rock classification accurately based on set pair analysis and tunnel seismic prediction [30]. Garg et al. (2018) presented an approach to investigate distance measures for connection number sets based on set pair analysis [31].
Some scholars have introduced set pair analysis to risk analysis of water conservancy. He et al. (2007) improved the set pair analysis method for regional WECC (generalized water environment carrying capacity) assessment through introducing the entropy weight method [32].  [37].
Determining the connection degree of the set pair is the key to establishing an evaluation model and obtaining accurate results. In traditional set pair analysis, identity means that the two related sets of an uncertain system belong to the same evaluation grade, and the corresponding connection degree is 1. Opposite means that the two sets belong to the interval evaluation grades, and the corresponding connection degree is −1. Difference means that they belong to the adjacent evaluation grades, and the corresponding connection degree is ∈ (−1, 1). However, two sets may be located on the superior side or inferior side of the evaluation grades when they belong to the adjacent grades or interval grades. It is difficult for traditional set pair analysis to determine the grades of samples carefully and accurately in this condition, because the evaluation results are quite different. Therefore, the method is improved by exploring the expansibility of the connection degree, introducing the generalized set pair potential, and applying it to an environmental impact evaluation of dam break.

Improvement of Traditional Set Pair Analysis
An expansion of the original connection degree is used to improve traditional set pair analysis. Difference and opposite are carved up specifically, in which corresponding connection degrees are divided into five grades. They are identity, superior difference, inferior difference, superior opposite and inferior opposite, as shown in Figure 2. one was defined and evaluated to describe the relative health state of the concerned urban ecosystems [25]. Hu et al. (2011) proposed a method based on cumulative prospect theory and set pair analysis for dynamic stochastic multi-criteria decision-making problems [26]. Li et al. (2011) introduced set pair analysis for groundwater quality assessment and assigned entropy weight to each index to improve the assessment model [27]. Wang et al. (2017) applied set pair analysis to the risk assessment of water inrush [28]. Cao et al. (2018) proposed an approach to interval-valued intuitionistic stochastic multi-criteria decision-making (MCDM) problems using set pair analysis [29]. Lin et al. (2018) proposed an advance optimized classification method to predict the surrounding rock classification accurately based on set pair analysis and tunnel seismic prediction [30]. Garg et al. (2018) presented an approach to investigate distance measures for connection number sets based on set pair analysis [31]. Some scholars have introduced set pair analysis to risk analysis of water conservancy. He et al. (2007) improved the set pair analysis method for regional WECC (generalized water environment carrying capacity) assessment through introducing the entropy weight method [32]. Wang [37].
Determining the connection degree of the set pair is the key to establishing an evaluation model and obtaining accurate results. In traditional set pair analysis, identity means that the two related sets of an uncertain system belong to the same evaluation grade, and the corresponding connection degree is 1. Opposite means that the two sets belong to the interval evaluation grades, and the corresponding connection degree is −1. Difference means that they belong to the adjacent evaluation grades, and the corresponding connection degree is ∈ (−1, 1). However, two sets may be located on the superior side or inferior side of the evaluation grades when they belong to the adjacent grades or interval grades. It is difficult for traditional set pair analysis to determine the grades of samples carefully and accurately in this condition, because the evaluation results are quite different. Therefore, the method is improved by exploring the expansibility of the connection degree, introducing the generalized set pair potential, and applying it to an environmental impact evaluation of dam break.

Improvement of Traditional Set Pair Analysis
An expansion of the original connection degree is used to improve traditional set pair analysis. Difference and opposite are carved up specifically, in which corresponding connection degrees are divided into five grades. They are identity, superior difference, inferior difference, superior opposite and inferior opposite, as shown in Figure 2. and b1 = b2= c1 = c2= 0. When xk is in the adjacent evaluation grades of sk, b1 and b2 correspond to the superior side and inferior side respectively. When xk is in the interval evaluation grades of sk, c1 and c2 correspond to the superior side and inferior side respectively. Therefore, Equation (1) can be expressed as Equation (2). evaluation grade of s k , a = 1 and b 1 = b 2 = c 1 = c 2 = 0. When x k is in the adjacent evaluation grades of s k , b 1 and b 2 correspond to the superior side and inferior side respectively. When x k is in the interval evaluation grades of s k , c 1 and c 2 correspond to the superior side and inferior side respectively. Therefore, Equation (1) can be expressed as Equation (2).

Determine the Connection Degrees
The connection degrees of the corresponding five grades can be determined according to the above improvement, as expressed in Equations (3)-(7).
where s 1 , s 2 , s 3 , s 4 and s 5 are boundary values of each evaluation grade respectively, and x k is the evaluation index value. The comprehensive connection degree vector can be calculated according to the connection degrees of all indexes, as expressed in Equation (8).
where ω is the weight of each evaluation index.

Determine the Grade of Evaluation
Traditional set pair analysis defines the ratio of identity degree to opposite degree as set pair potential, as expressed in Equation (9).
where SHI(µ) is set pair potential; c 1 and c 2 are opposite degrees corresponding to the superior side and inferior side respectively. However, two problems always exist in Equation (9): (a) There is a certain limitation in the situation of c=0, which cannot be dealt with by set pair potential; (b) The evaluation result is inconsistent with set pair potential classification in some cases.
In order to solve the above-mentioned problems of set pair potential, a generalized set pair potential is proposed, as expressed in Equation (10).
where SHI(µ) G is generalized set pair potential; a is identity degree; c is opposite degree. Based on the generalized set pair potential vector, the rank of the samples can be determined according to the principle of maximum generalized set pair potential, meaning the corresponding grade of maximum generalized set pair potential is the grade of evaluation result. As expressed in Equation (11).

Selection of Indexes
To date, there is no unified definition of the environmental impact of dam break. The essence of environmental impact refers to the changes in natural environment and ecological conditions around the reservoir, which have been caused by dam break. The main factors of natural environmental impact assessment include water, soil, atmosphere, noise, solid waste. Ecological impact is visually reflected in two aspects of biology and humanistic ecology.
Once a dam is ruptured, the high-speed flow of water will cause erosion and damage to the original river slope, soil and vegetation. The sediment carried by the high-speed flow will deposit in the original river and change its morphology. If there are chemical industries downstream of the dam, the leaked substances will also pollute water and soil. Furthermore, the living environment of the original organisms and human beings will be greatly affected or even destroyed. It is unscientific and unwise to analyze all environmental impact aspects caused by dam break due to the complexity of the problem. According to the principles of scientificity, practicability, typicality, and qualitative and quantitative integration [38], seven indexes are selected to evaluate the environmental impact of dam break combined with existing research and relevant laws and regulations [16], such as the Environmental Protection Law of the People's Republic of China, the Environmental Impact Assessment Law of the People's Republic of China and the Guidelines for Environmental Impact Assessment of Water Conservancy and Hydropower Projects [21,39]. The index system is shown in Figure 3. the problem. According to the principles of scientificity, practicability, typicality, and qualitative and quantitative integration [38], seven indexes are selected to evaluate the environmental impact of dam break combined with existing research and relevant laws and regulations [16], such as the Environmental Protection Law of the People's Republic of China, the Environmental Impact Assessment Law of the People's Republic of China and the Guidelines for Environmental Impact Assessment of Water Conservancy and Hydropower Projects [21,39]. The index system is shown in Figure 3.

Classifications and Grades of Indexes
The severity of all evaluation indexes can be divided into five grades, which are slight, ordinary, medium, serious and extremely serious. River morphology and vegetation cover are easily quantified and the other five indexes can be qualified according to statistical information or specific analysis.
Combined with the characteristics of various indexes [16], their classifications and grades are decided, as shown in Table 1.

Evaluation Process Based on Generalized Set Pair Analysis
(1)Selecting specific indexes according to the characteristics of the object to be evaluated and the typical evaluation index system of Figure 3.
(2)Defining the value of each evaluation index according to Table 1 and analyzing the weights of all indexes according to proper methods.
(3)Calculating the generalized set pair potential of each grade based on set pair potential according to Equation (10).
(4)Determining the evaluation grade based on the principle of maximum generalized set pair potential.

Classifications and Grades of Indexes
The severity of all evaluation indexes can be divided into five grades, which are slight, ordinary, medium, serious and extremely serious. River morphology and vegetation cover are easily quantified and the other five indexes can be qualified according to statistical information or specific analysis.
Combined with the characteristics of various indexes [16], their classifications and grades are decided, as shown in Table 1.

Evaluation Process Based on Generalized Set Pair Analysis
(1) Selecting specific indexes according to the characteristics of the object to be evaluated and the typical evaluation index system of Figure 3.
(2) Defining the value of each evaluation index according to Table 1 and analyzing the weights of all indexes according to proper methods.
(3) Calculating the generalized set pair potential of each grade based on set pair potential according to Equation (10).
(4) Determining the evaluation grade based on the principle of maximum generalized set pair potential.

Results
Taking Shaheji Reservoir as an example, generalized set pair analysis is applied to evaluate the environmental impact of dam break.
Shaheji Reservoir is located in Chuzhou City, Anhui Province of China. The height of its dam is 27.4 m and the capacity of the reservoir is 211 million m 3 . The reservoir protects more than 30,0000 people, 20,000 hectares of farmland, the Beijing-Shanghai Railway and so on, as shown in Figure 4. In addition to significant losses in terms of human life and the economy, dam break will have a serious environmental impact.
Taking Shaheji Reservoir as an example, generalized set pair analysis is applied to evaluate the environmental impact of dam break.
Shaheji Reservoir is located in Chuzhou City, Anhui Province of China. The height of its dam is 27.4 m and the capacity of the reservoir is 211 million m 3 . The reservoir protects more than 30,0000 people, 20,000 hectares of farmland, the Beijing-Shanghai Railway and so on, as shown in Figure 4. In addition to significant losses in terms of human life and the economy, dam break will have a serious environmental impact.

Values and Weights of Indexes
River morphology and vegetation coverage can be calculated by using survey data according to "scour or sediment volume per length and width" and "land damage rate and severity" respectively.
Because evaluation of environmental impact in this paper is a kind of risk prediction based on the hypothesis of dam break, it is difficult for previous sampling analysis methods to determine the changes of water quality and soil quality. In addition, direct data of the two indexes are difficult to obtain by simulation. Therefore, combined with the Surface Water Environmental Quality Standard and the Soil Environmental Quality Standard of China, it is proposed that water and soil environment can be classified according to their sensitivity. For example, the higher the quality of raw water or soil environment downstream of the dam, the more sensitive it will be to the impact of flood caused by dam break. In this case, their risk levels are higher. The other indexes can be assigned by experts according to Table 1 combined with engineering practice.
However, due to human subjective factors, there are some uncertainties in determining the values of indexes. In order to facilitate comparative analysis, the results of reference [40] were adopted. In addition, all data are processed into range [0, 100] to avoid a negative impact on the evaluation results caused by different magnitudes of indexes. Therefore, the values of environmental impact indexes of dam break at Shaheji Reservoir can be determined, as shown in Table 2.

Values and Weights of Indexes
River morphology and vegetation coverage can be calculated by using survey data according to "scour or sediment volume per length and width" and "land damage rate and severity" respectively.
Because evaluation of environmental impact in this paper is a kind of risk prediction based on the hypothesis of dam break, it is difficult for previous sampling analysis methods to determine the changes of water quality and soil quality. In addition, direct data of the two indexes are difficult to obtain by simulation. Therefore, combined with the Surface Water Environmental Quality Standard and the Soil Environmental Quality Standard of China, it is proposed that water and soil environment can be classified according to their sensitivity. For example, the higher the quality of raw water or soil environment downstream of the dam, the more sensitive it will be to the impact of flood caused by dam break. In this case, their risk levels are higher. The other indexes can be assigned by experts according to Table 1 combined with engineering practice.
However, due to human subjective factors, there are some uncertainties in determining the values of indexes. In order to facilitate comparative analysis, the results of reference [40] were adopted. In addition, all data are processed into range [0, 100] to avoid a negative impact on the evaluation results caused by different magnitudes of indexes. Therefore, the values of environmental impact indexes of dam break at Shaheji Reservoir can be determined, as shown in Table 2. The Analytic Hierarchy Process is used to determine the weight of each environmental impact index of dam break. Through a process of expert scoring, judgment matrix establishment, importance ranking and consistency testing, the weight vector of evaluation indexes is calculated, as expressed in Equation (12). W = [ω 1 · · · ω j · · · ω 7 ] = [0.068, 0.26, 0.5, 0.078, 0.052, 0.017, 0.025]

Connection Degree Calculation
According to the values and weights of all indexes, connection degrees corresponding to the five grades can be calculated, as shown in Table 3.
A compound operation is used to obtain the comprehensive connection degree vector according to Equation (8), as expressed in Equation (13).

Evaluation Results
The generalized set pair potential vector corresponding to the five grades of the environmental impact of Shaheji Reservoir dam break is calculated by Equation (10) Results are compared with those of fuzzy mathematics theory to verify the validity of the method, as shown in Table 4.

Discussion
(1) The order of evaluation value of each grade from large to small is extremely serious, serious, medium, ordinary, and slight in fuzzy mathematics theory; whereas in generalized set pair analysis, it is serious, extremely serious, medium, ordinary, and slight. The results of the two methods are similar, showing that the generalized set pair analysis method proposed in this study can be effectively applied to environmental impact evaluation of dam break.
(2) The highest generalized set pair potential of five grades reflects the severity of environmental impact caused by dam break. Considering the weights of all indexes, the evaluation grade according to generalized set pair analysis is "serious", while the result of fuzzy mathematics theory is "extremely serious". However, only one basic index is extremely serious, while three indexes are serious. Therefore, it is more reasonable to define the comprehensive evaluation grade of the environmental impact of Shaheji dam break as serious.
(3) Based on the evaluation results and severities of all indexes, appropriate risk management measures should be taken to reduce the environmental impact of dam break in daily management of Shaheji Reservoir. The focus of management is on those serious or extremely serious indexes.
(4) An appropriate evaluation index system is the basis of accurately evaluating the environmental impact of dam break. Due to the different locations of dams and their downstream natural environment and social development, the index system should be adjusted appropriately. In these cases, the method proposed in this paper is still very feasible. When a certain kind of indexes play a crucial role, such as there are ecological protection areas or special chemical enterprises which can easily cause great damage downstream, special attention should be paid to such indexes, not only in terms of weight.
(5) Environmental impact analysis of dam break is in its infancy. There is a certain randomness in the evaluation results of various evaluation methods due to the relationship among the indexes. It is necessary to make a more detailed analysis of all indexes and clarify their relative importance in future research. Meanwhile, multiple evaluation methods can be used to verify each other in order to improve the accuracy of the evaluation results.

Conclusions
It is difficult to evaluate the environmental impact of dam break by traditional methods, because the classification of impact is ambiguous and the indexes are related. This paper improves traditional set pair analysis by extending the connection degree to five grades and depicting the difference degree and opposite degree more specifically. Meanwhile, the definition of generalized set pair potential is introduced to determine the evaluation grade. A problem of traditional set pair analysis, namely that it cannot distinguish the grades of samples clearly when two sets locate on the superior side or inferior side, is solved. Furthermore, the application limitation that set pair potential does not apply to the case of c=0 is overcome. Combined with evaluation indexes, a generalized set pair analysis method is established to evaluate the environmental impact of dam break. Taking Sheheji dam as an example, the results of the study indicate that generalized set pair analysis is preferable and can be applied successfully. Therefore, this paper proposes a new method for evaluating the environmental impact grade of dam break, which provides guidance for the adoption of targeted control measures to reduce environmental risk

Conflicts of Interest:
The authors declare no conflict of interest.