Assessment of Rough Set Theory in Relation to Risks Regarding Hydraulic Engineering Investment Decisions

: Rough set theory is a mathematics tool specifying imperfection and uncertainty. Based on the knowledge theory of the rough set, the numerical values of some features or attributes are not required. Through data reduction, this article analyzes the investment decision of hydraulic engineering and obtains the following by reduction: i. when the construction expense of the hydraulic engineering is low, but the ﬁnancial income is high, the investment in the construction project can be selected; ii. when the expense of the hydraulic construction project is low and the external inﬂuence is common or the ﬁnancial expense is common, if the external inﬂuence is low or the ﬁnancial income is high, the investment can be delayed; iii. when the construction strategic beneﬁt of the hydraulic engineering is low, the decision rule of no investment can be selected. The novel ﬁndings discovered by this article have provided scientiﬁc information regarding investment decisions of hydraulic engineering.


Introduction
Rough set theory (also called rough set) is a mathematical tool which was proposed by Professor Pawlak in 1982 and is able to make quantitative analysis and processes for inaccurate, inconsistent, and incomplete information and knowledge [1]. The original prototype of rough set theory resulted from a simple information model, and its basic idea was defined by a relational database used to form concepts and rules and realize knowledge discovery by the classification of the equivalence relation and of objective approximation.
The core foundation of rough set theory and its application is a couple of approximate operators derived from approximation space, namely the upper approximation operator and lower approximation operator (also called the upper and lower approximation set). The obscurity relationship of the classic Pawlak model is an equivalence relation with high demands, and it restricts the application of the rough set model. Therefore, how to expand and define the approximate operator has become an important area of rough set theory research [2].
At present, there are two kinds of common research methods on expanding rough set theory, namely, the structural method and the axiomatic method. The structural method takes binary relation, division, coverage, a neighborhood system and Boole algebra as basic elements, defines the rough approximation operator and educes a rough set algebra system. The basic element of the axiomatic method is a couple of unitary approximation operators meeting some axioms, and some axioms of approximation operators can ensure the existence of some special types of binary relationships; in turn, the approximation operator derived from the binary relationship by the structural method must meet some axioms.
The most remarkable difference between rough set theory and other theories processing uncertain and inaccurate questions is that rough set theory does not need any prior information except the dataset processed for providing questions; thus, the uncertain description or processing for questions is objective. Since this theory has not contained the mechanism of processing inaccurate or uncertain original data, the combination of this theory together with probability theory, fuzzy mathematics, evidence theory and other theories to process uncertain or inaccurate questions is highly complementary. Therefore, the study of the relationship between rough set theory and other theories is one of the topics of interest of rough set theory research.
Because of the novel idea and unique method of rough set theory, rough set theory has become an important intelligent information processing technology [3,4], which has been widely used in aspects such as machine learning and knowledge discovery, data mining, decision support and analysis. According to statistics, the forecast error rate of rough set applied to the engineering field is 6.67% and the average error rate is 10.27% [5]. Rough set is a kind of advanced decision tool. Many scholars applied rough set theory to fields such as industrial control [6][7][8], medicine, health and biological science [9][10][11]; traffic and transportation [12][13][14]; agricultural science [15,16]; environmental science and environmental protection management [17]; safety science [18]; social science [19]; and aviation, spaceflight and military affairs [20,21]. When the evidence or the information provided is not sufficient, direct acceptance or rejection may seem unreasonable. That is, the cost of accepting or rejecting a decision will be greater than the cost of not making a decision [22]. The decision-making model based on Bayesian theory has been extensively studied. When making a decision, regardless of whether it is an acceptance or rejection, there will be a certain risk loss [23]. The loss function is based on the past experiences of experts and in decision theory it has different domains.
Based on further application of rough set theory in engineering investment decision-making, this paper proposes the following innovations: (1) the application of rough set theory in the optimization of water conservancy project investment decision-making is proposed, which allows for further development of the application of rough set theory; (2) the tacit knowledge based on rough sets discovered in this paper provides a scientific basis for water conservancy project investment decision-making. This theory proposes that there are different combinations of investment decision-making strategies under different conditions in the investment of water conservancy projects; (3) the data processed by the rough set method are discrete, and, unlike other methods, this theory can solve the discrete phenomenon of data. The application of this method makes the conclusion more reliable. Based on the characteristic of the rough set realizing knowledge discovery by classification of the equivalence relation and of objective approximation, this article applies the rough set to decisions regarding hydraulic engineering investment, which is beneficial to decision-making scientization.

Knowledge Theory Based on the Rough Set
Rough set theory is a mathematical tool specifying imperfection and uncertainty. It can effectively analyze various kinds of imprecise, inconsistent, uncompleted, or imperfect information from which it can find hidden knowledge that can be used to discover a potential rule.

Information System and Decision Table
The information (knowledge expression) system S can be expressed as S = (U,A,V,f), wherein U is the set of objects, i.e., the domain: A = C∪D, in which the subsets C and D are the respectively conditioned attribute set and the decision attribute set and C∩D = Φ; V is the codomain of object attributes, V = a∈AVa, and Va means the scope of attribute values of the attribute a∈A; f is the information function and specifies the attribute value of each object attribute, and f:U × A→V,f(x,a)∈Va.
To objects x, and y ∈U, if a∈R,RC, is satisfied, and both have f(x,a) = f(y,a), and it is stated that objects x and y cannot be distinguished under the conditional attribute subset R, or else it is stated that x and y can be distinguished, to every conditional attribute subset R, a binary relation IND(R) which cannot be distinguished is defined, and it is expressed as: Assume x∈U, the equivalence [x]R of the object x decided by the conditional attribute subset R is expressed as: The decision table is an important knowledge expressing system, and the knowledge expressing system with condition attributes and decision attributes is the decision table. The list head of the decision table includes various attributes. Each line of it denotes one member or a decision rule in the domain, and each column represents the attribute and the attribute value. In the information system, one attribute corresponds to an equivalence relationship, and one table can be considered as a defined equivalence relationship, i.e., knowledge bases [24,25].

Upper Approximation and Low Approximation
Generally, the imprecise concept is expressed by two precise concepts of upper approximation and low approximation. Assume {U,R} is the approximation space, wherein U is the domain, and R is the upper equivalence relation of U, XU; the low approximation set of X about R is defined as follows: R(X) is the set comprising all objects which definitely belong to X after the judgment according to the available knowledge, and it is also denoted as the positive area of x and recorded as POS(X). The set comprising all objects which do not definitely belong to X after the judgment according to the available knowledge is denoted as the negative area of x, and also recorded as NEG(X). The upper approximation set of x about R is defined as: R(X) is the union of all equivalence [x] Rs which intersect with X but are not empty, and the set of all objects which may belong to X. Of course, the boundary region of −R(X)∩NEG(X) = Φ and U = − R(X)∪NEG(X) and set X is defined as: BND(X) is the difference of the upper approximation and low approximation of set X; if BND(X) = Φ, it is stated that X is clear about R; and if BND(X) Φ, it is stated that X is the rough set about R.

Reduction and the Core of Attribute
The reduction is the most important concept which is used by the rough set in data analysis. Mainly, the acquisition of knowledge of the rough set is the reduction in the original decision table. Under the condition of keeping the dependency relationship between the decision attribute and the condition, the attribute is not changed. The decision table is reduced.
The process of reducing the attribute of the decision table [26] is that the unnecessary condition attributes (not important for obtaining the decision) are removed from the condition attributes of the system of a decision table to analyze the impact of the decision rule of the reduced condition attribute on the decision attribute. Under different systems or different conditional environments, people have different requirements and expectations regarding the reduction in the attribute. If the attribute values of some attributes in a system cannot be easily acquired, and the expense for measuring these attribute values is high, the attributes should be removed from the decision table. The condition items included in the acquired reduction result should be as small as possible or the acquired number of decision rules should be kept to a minimum [27]. Usually, there will be a plurality of reductions in an equivalence relationship, and the intersection of all reductions is defined as the core of P. The core represents some invariable information and is recorded as core (P). The core can be used as the calculation base of all reductions. In knowledge reduction, the set of characteristics of knowledge cannot be cancelled in the core. Core(P) = ∩red(P), wherein red(P) is all reductions of P. In the concrete application of the rough set, the relationship of one classification is very important to another classification, and it is the relative reduction and relative core of knowledge [28,29].

Knowledge Acquisition
Currently, most methods for acquiring knowledge are based on machine studies, mode recognition, statistics, etc. The method of the decision tree is mainly used to obtain descriptions and models regarding data within a database. The nonlinear regression analysis and classification method utilize the method of regression analysis to generate a function which maps the data item to a prediction variable of actual value, finding the dependence relationship between variables or attributes, etc. Rough set theory introduces the concept of a distinguishable matrix and attribute importance and can excellently process fuzzy and imperfect knowledge problems, thus becoming an important tool for acquiring knowledge from a database [30].
The importance of the attribute subset B' in the attribute set B (B ⊆B) is defined as rB(F) − rB\B (F), wherein F is the classification concluded from attribute D, and the above formula expresses the influence on the quality of approximation classification of F when the attribute subset B' is removed from the attribute set B. On this basis, one of the important subjects of rough set theory is calculating the reduction in the attribute. Calculating an attribution reduction is equal simply to the alteration of the original decision table into an equivalence decision table that only includes attribute reduction without changing the decision result, and it provides convenience for further extracting (finding) the further rule. Researchers have proposed many attribute reduction algorithms.

Dependency of Knowledge
To reduce knowledge and conclude new knowledge from the given knowledge, the dependency relationship of functions in the database must be researched. When all elementary categories of Q can be defined by some elementary categories in P, it means knowledge Q can be concluded from knowledge P. When Q can be concluded from knowledge P, it is stated that Q depends on P and is recorded as P⇒Q. The dependency can be formally defined as follows: make k = (U,K) as a knowledge base, and P,Q⊆R: (1) When IND(P) ⊆ IND(Q), knowledge Q depends on knowledge P; (2) When P⇒Q and Q⇒P, knowledge P and Q are equal and this is recorded as P≡Q; when P⇒Q and Q⇒P are not available at the same time, P and Q are independent. Only when IND(P)⊆IND(Q), P⇒Q. We provide the knowledge dependency measure, k, k = |r p (Q) = posP(Q)|\|U|, and state that Q is k(0 ≤ k ≤ 1), and only depends on P. This is recorded as P. When K = 1, knowledge Q fully depends on knowledge P. When k = 1, knowledge Q does not fully depend on knowledge P. The value of K can be used for deciding the dependency relationship between the decision attribute and the condition attribute. In the concrete application of rough set knowledge theory, some knowledge cannot be completely concluded, but can be measured by the dependency of knowledge [31,32].

Application Steps Based on the Knowledge Theory of the Rough Set
Generally, the formation of knowledge application appears in the form of the decision table [33]. To explain the application method based on the knowledge theory of the rough set, a method for processing the information of the decision table is proposed, and the main steps of the method are as follows: (1) Initialize the domain and carry out the defect processing of data and the discrete normalization of attribute values. In the process of generating the decision table T, the division criterions of discrete areas provided by the field experts should be used for selecting the proper break point to divide the spaces formed by the condition attributes so as to reduce the search space, and the decision rule table T(U,A,C,D) [34] is obtained after strict data preprocessing.

Application and Research of the Theory of the Rough Set
There is investment in a construction project of large-scale hydraulic engineering with eight alternative projects. The alternative project sets are expressed by {a 1 ,a 2 ,...a 8 }. Before investment, the investor needs to carry out risk evaluation of these projects and then decide whether to invest or not. There are four main standards for evaluation: construction expense, financial income, strategy benefit and external influence, and {investment, no investment} is the decision set of the investor. Indexes are in Table 1. Limited by conditions, the information of some projects cannot be acquired and expressed by *, wherein A = {a 1 ,a 2 , . . . a 8 } is used for expressing the set of investment projects, C = {c 1 ,c 2 ,c 3, c 4 } is the set of evaluation indexes, and d means the investment decision of the investor.
Additionally, "1" represents the decision of investment, "2" represents the decision of delayed investment, and "3" represents the decision of no investment; thus, {1,2,3} is the set of decision values, and all decisions are confirmed here.
Discrete processing is carried out on the engineering tender risk and is expressed by 0, 1, 2, 3. The concrete discrete method is as follows: the index value of the construction expense is expressed by a; if the construction expense is higher than 1000, "1" is used for expression; if the construction expense is 500 ≤ a < 1000, "2" is used for expression; and if a < 500, "3" is used for expression.
The index value of financial income is expressed by b; if the financial income is higher than 500, "3" is used for expression; if 250 ≤ b < 500, "2" is used for expression; and if b < 250, "1" is used for expression.
The index value of the strategic benefit is expressed by c; if the strategic benefit is higher than 550, "3" is used for expression; if 350 ≤ c < 550, "2" is used for expression; and if c < 250, "1" is used for expression.
The index value of external influence is expressed by d; if the external influence is higher than 400, "3" is used for expression; if 200 ≤ d < 400, "2" is used for expression; and if d < 200, "1" is used for expression.
After discretization, the simplified table of decision rules is obtained, and it is calculated according to the following concrete steps: Step 1: data processing. To the project, the construction expense, financial income, strategic benefit, external influence, and investment decision are respectively expressed by a, b, c, d and e, and the information of Table 1 is brought into the decision table shown in Table 2.  Step 2: attribute reduction. Table 2 is the consistent decision table, Table 3 after attribute reduction is obtained according to the attribute of the core value.
Step 3: reduce the value of the decision table shown in Table 3   Similarly, if available, we can obtain the reductions of the seven other decision rules: (1) b 1 →e 2 (2) d 3 →e 3 (3) a 1 d 2 →e 3 (4) a 3 d 2 →e 1 (6) a 1 d 1 →e 2 or b 2 d 1 →e 2 (7) a 3 d 1 →e 1 or a 3 b 3 →e 1 (8) a 2 b 3 →e 3 Step 4: by arranging the decision table shown in Table 3, there will be no repeated decision.
Step 5: integrate the results obtained from the above four steps and obtain the following decision algorithm: (1) a 3 d 2 ∨a 3 d 1 ∨a 3 b 3 →e 1 (2) a 1 d 1 ∨b 2 d 1 ∨b 1 →e 2 (3) d 3 ∨a 2 b 3 ∨a 1 d 2 →e 3 The following language can be used for explaining the above decision rule: (1) When the construction expense of hydraulic engineering or the external influence or construction expense is low but the financial income is high, the investment in the construction project can be selected; (2) When the expense of the hydraulic construction project is low and the external influence is common or the financial expense is common, if the external influence is low or the financial income is high, the investment can be delayed; (3) When the construction strategic benefit of hydraulic engineering is low, the financial benefit is common or the construction expense is common, the decision of no investment can be selected.
Through comparison, it can be observed that the method is consistent with the literature [35,36] in the conclusions of investment decisions, and it can further prove the stability and reliability of rough set theory in investment decisions.

Conclusions
Rough set theory has high applicability for processing uncertain factors and imperfect information. Based on simplifying the knowledge of data and removing the redundant attributes influencing the decision, the minimum decision rule is acquired. This article suggests that based on the application of the rough set on the investment decision of hydraulic engineering, there are different strategies of investment decisions under different conditions of investment combinations. The following conclusions can be obtained: (1) The investment in the construction of hydraulic engineering relates to various uncertain factors, and after being tested, compared with those of other investment methods, the rule acquired based on the method of the rough set for the information excluded from the decision time is hidden; thus, the adaptability is higher, and high precision can be acquired. (2) To acquire the rule of the investment decision regarding hydraulic engineering, the rough set method relates more condition attributes and the figure-expressing mode cannot be easily formed; thus, its application is not as direct as that of the common figure of investment decisions.
(3) The data processed by the rough set method are discrete. Common methods cannot solve the discrete phenomenon of data and, as a result, are not able to acquire scientific conclusions.
One of the limitations of this research was the scarcity of data which were based on the data of eight alternative large-scale hydraulic engineering projects. To obtain more significant results, it is recommended that the sample size be expanded in future studies.