This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

A large number of parameters are acquired during practical water quality monitoring. If all the parameters are used in water quality assessment, the computational complexity will definitely increase. In order to reduce the input space dimensions, a fuzzy rough set was introduced to perform attribute reduction. Then, an attribute recognition theoretical model and entropy method were combined to assess water quality in the Harbin reach of the Songhuajiang River in China. A dataset consisting of ten parameters was collected from January to October in 2012. Fuzzy rough set was applied to reduce the ten parameters to four parameters: BOD_{5}, NH_{3}-N, TP, and F. _{5}, NH_{3}-N, TP, TN, F, and F.

As human activities have intensified in recent years, water pollution has become more and more serious and drawn much local and international attention [

The determination of weights is a vitally significant aspect of water quality assessment, as the weights of parameters can obviously affect assessment results. Therefore, how to choose an appropriate determination method has received enhanced awareness. A large number of weight determination methods are introduced to assess water quality [

Besides the determination of weights, the selection of parameters is another important issue in water quality assessment. A large amount of parameters are obtained during water quality monitoring, yet, all the parameters are not equally important, and some parameters are even irrelevant to the assessment results. If all the parameters monitored are used to assess water quality, the computation will definitely be complicated. It is usual to choose parameters based on subjective experience to reduce the input space dimensions, but this is not reasonable and is unreliable to some extent. In order to be objective, Principal Component Analysis (PCA) and Factor Analysis (FA) are used to reduce the input space dimensions [

Songhuajiang River, with a total length of 1,657 km and a drainage area of about 556,800 km^{2}, is located between 41°42′ to 51°48′ latitude north and 119°52′ to 132°31′ longitude east. The total runoff is 75.9 billion m^{3}. Its headstream includes the southern source and the northern source. The southern source, the Second Songhuajiang River, originates from Heaven Lake in Jilin Province, and the northern source, Nenjiang River, originates from the southern slopes of the middle part of Yilehuli Mountain, a branch of China’s Great Hinggan Mountains. After the convergence of the southern source and the northern source at Sanchahe Town in Fuyu City, the river is called Songhuajiang River (Songhuajiang main stream) and runs eastwardly until it finally empties into Heilongjiang River in Tongjiang City. Songhuajiang River has a long icebound season, and two flood seasons, the spring flood season and the summer flood season. Harbin station, the major station after the convergence of Second Songhuajiang River and Nenjiang River, is situated at the midstream of Songhuajiang River. Songhuajiang River is the source of water and the receiving water body of wastewater for Harbin City, the capital city of Heilongjiang Province.

The data for the Harbin reach of January to October in 2012 were chosen as the research target [_{4} (COD_{Mn}), chemical oxygen demand (COD), 5-day biochemical oxygen demand (BOD_{5}), ammonia nitrogen (NH_{3}-N), total phosphorus (TP), total nitrogen (TN), fluoride (F), and fecal coliforms (F.

An information system represented by a table should be firstly constructed. In the table, a set of objects are depicted by a set of attributes [_{1}, x_{2}, …, x_{m}} is a non-empty finite set of objects, A = {a_{1}, a_{2}, …, a_{n}} is a non-empty finite set of attributes, _{a}

Step 1. Standardization of the initial data.

Suppose that there are m objects and n parameters to form R as below:

where R is the initial decision matrix, r_{ij} (

For efficiency type, the function of standardization is:

For cost type, the function of standardization is:

For interval type, the function of standardization is:

where [q_{1}, q_{2}] is the best interval of r_{ij}.

After normalization of R, the standard-grade matrix Y can be obtained as:

Step 2. Determination of fuzzy similarity class.

∀_{s}, x_{t}_{s}Rx_{t} is defined as:

where α is the distance between x_{s} and x_{t}, and 1-α is the similarity degree of x_{s} and x_{t}. The value α was set to 0.3 in this study [_{i}), fuzzy similarity class of x_{i}, can be got by calculating all the objects that are fuzzy similar to x_{i}:

Step 3. Calculation of lower approximation of variable precision rough set.

PRS attribute reduction relies on lower approximation, which is based on set inclusion. It is sufficient in many applications, but noisy data exist in the real world. To relax the restrictive lower approximation, VPRS is introduced. VPRS can solve classification problems with uncertain data by setting a confident threshold value β. The purpose of VPRS is to classify the objects with a permissible error no greater than a certain pre-defined level.

Let X be the objects classification of all the parameters, and let FR(a_{i}) be the objects classification without the parameter a_{i}. X and FR(a_{i}) can be obtained by Equation (8). Set confidence threshold value β (0.5 < β ≤ 1) be a real number, the lower approximation of VPRS is defined as:

where |·| denotes cardinality of the set, and the set _{β}_{i}

Step 4. Calculation of β-approximate classification quality.

The β-approximate classification quality is shown as:
_{R}_{i}_{β}_{i}

To itself, the β-approximate classification quality of the classification by all attributes equals 1. If the classification after eliminating the attribute a_{i} is the same as that before attribute reduction, the β-approximate classification quality should be 1 too. Therefore, based on the β-approximate classification quality, attribute reduction involves ensuring that _{R}_{i}

Entropy method is an objective tool to determine weights of parameters by calculating the difference degree of all parameters. It is calculated as follows [

Information entropy should be firstly calculated as:

where H_{j} is the information entropy of the jth parameter, _{ij}_{ij}_{ij}

Then the entropy weight of the jth parameter is:

The specific steps of ARTM are stated as follows [

Step 1. Establishment of attribute space matrix.

There are m objects and n parameters in object space R:

Suppose F is some attribute space, and (C_{1}, C_{2}, …, C_{K}) is an ordered series of ranks in attribute space F, satisfying C_{1} > C_{2} > … > C_{K}. Therefore, the classification standard for each parameter is known, the classification standard matrix can be expressed as A:

where _{j}_{1} < _{j}_{2} < ⋯ < _{jK}_{j}_{1} > _{j}_{2} > ⋯ > _{j}_{K}

Step 2. Determination of attribute measure.

The attribute measure _{ijk}_{ij}_{K}_{ij}, which takes the attribute levels from the set C_{K}, is calculated. Suppose that _{j}_{1} < _{j}_{2} < ⋯ < _{jK}

when _{ij}_{j}_{1}, assume that _{ij}_{1} = 1, _{ij}_{2} =⋯= _{ijK}

when _{ij}_{jK}_{ijK}_{ij}_{1} =⋯= _{ijK}_{−1}= 0;

when _{j}_{l} ≤ _{ij}_{j}_{l+1}, assume that

Considering the weights, the attribute measure of x_{i} is shown as:

Step 3. Establishment of attribute recognition theoretical model.

The confidence level _{i} and described as below:

In the formula, x_{i} is taken to belong to C_{ki} The confidence level

The Environmental Quality Standards for Surface Water of China (EQSSWC) are listed in

Environmental Quality Standards for Surface Water of China.

Parameters | I | II | III | IV | V |
---|---|---|---|---|---|

pH | 6–9 | ||||

DO (mg/L) | ≥7.5 | ≥6 | ≥5 | ≥3 | ≥2 |

COD_{Mn} (mg/L) |
≤2 | ≤4 | ≤6 | ≤10 | ≤15 |

COD (mg/L) | ≤15 | ≤15 | ≤20 | ≤30 | ≤40 |

BOD_{5} (mg/L) |
≤3 | ≤3 | ≤4 | ≤6 | ≤10 |

NH_{3}-N (mg/L) |
≤0.15 | ≤0.5 | ≤1.0 | ≤1.5 | ≤2.0 |

TP (mg/L) | ≤0.02 | ≤0.1 | ≤0.2 | ≤0.3 | ≤0.4 |

TN (mg/L) | ≤0.2 | ≤0.5 | ≤1.0 | ≤1.5 | ≤2.0 |

F (mg/L) | ≤1.0 | ≤1.0 | ≤1.0 | ≤1.5 | ≤1.5 |

F. |
≤200 | ≤2,000 | ≤10,000 | ≤20,000 | ≤40,000 |

As it can be seen in

pH and the concentration of F are found within the permissible limits. It can also be concluded that F.

Statistical analysis results for various parameters.

Parameters | Min–Max | Median | Mean | SD | CV | Permissible Limits | MNEPL ^{a} |
---|---|---|---|---|---|---|---|

pH (a_{1}) |
7.16–8.55 | 7.52 | 7.61 | 0.401 | 0.0527 | 6–9 | 0 |

DO (a_{2}) |
4.8–13 | 7.7 | 8.44 | 2.6073 | 0.3089 | ≥5 | 1 |

COD_{Mn} (a_{3}) |
3.12–6.48 | 5.04 | 5.209 | 0.9733 | 0.1868 | ≤6 | 2 |

COD (a_{4}) |
12–23 | 16.5 | 16.8 | 3.49 | 0.2077 | ≤20 | 1 |

BOD_{5} (a_{5}) |
1–4.6 | 2.4 | 2.69 | 1.4255 | 0.5299 | ≤4 | 3 |

NH_{3}-N (a_{6}) |
0.12–1.07 | 0.44 | 0.535 | 0.3868 | 0.7229 | ≤1.0 | 2 |

TP (a_{7}) |
0.04–0.69 | 0.07 | 0.144 | 0.1978 | 1.3738 | ≤0.2 | 1 |

TN (a_{8}) |
1.1–2.58 | 1.55 | 1.607 | 0.4423 | 0.2752 | ≤1.0 | 10 |

F (a_{9}) |
0.24–0.38 | 0.3 | 0.298 | 0.0419 | 0.1404 | ≤1.0 | 0 |

F. _{10}) |
20–24,196 | 1,514 | 3,793.4 | 7,227.91 | 1.9054 | ≤10,000 | 1 |

Note: ^{a} monthly numbers exceeding the permissible limits.

_{3}-N. The high concentrations of nitrate, nitrite and NH_{3}-N in drinkable water and water source can be poisonous to human and aquatic life. NH_{3}-N concentrations beyond the permissible limit lower the oxygen combining ability of aquatic life forms. Fortunately, the NH_{3}-N concentration is fairly good and reasonably satisfactory, with only two months showing values slightly higher than the permissible limit. Because Harbin City is the capital city of Heilongjiang Province, and the Songhuajiang River is the receiving water body of wastewater from Harbin City, the high concentration of TN is mainly attributed to domestic sewage and industrial effluents.

TN concentration in the study period is illustrated in

FRS attribute reduction is carried out by MATLAB 8.0. The FRS attribute reduction process is shown in

Plot of TN temporal distribution.

Process of FRS attribute reduction.

Subset of Reserved Attributes | Subset of Deleted Attributes | β-Approximate Classification Quality | Delete ^{a} |
---|---|---|---|

{a_{2},a_{3},a_{4},a_{5},a_{6},a_{7},a_{8},a_{9},a_{10}} |
{a_{1}} |
1 | Y |

{a_{3},a_{4},a_{5},a_{6},a_{7},a_{8},a_{9},a_{10}} |
{a_{1},a_{2}} |
1 | Y |

{a_{4},a_{5},a_{6},a_{7},a_{8},a_{9},a_{10}} |
{a_{1},a_{2},a_{3}} |
1 | Y |

{a_{5},a_{6},a_{7},a_{8},a_{9},a_{10}} |
{a_{1},a_{2},a_{3},a_{4}} |
1 | Y |

{a_{6},a_{7},a_{8},a_{9},a_{10}} |
{a_{1},a_{2},a_{3},a_{4},a_{5}} |
0.7 | N |

{a_{5},a_{7},a_{8},a_{9},a_{10}} |
{a_{1},a_{2},a_{3},a_{4},a_{6}} |
0.2 | N |

{a_{5},a_{6},a_{8},a_{9},a_{10}} |
{a_{1},a_{2},a_{3},a_{4},a_{7}} |
0.9 | N |

{a_{5},a_{6},a_{7},a_{9},a_{10}} |
{a_{1},a_{2},a_{3},a_{4},a_{8}} |
1 | Y |

{a_{5},a_{6},a_{7},a_{10}} |
{a_{1},a_{2},a_{3},a_{4},a_{8},a_{9}} |
1 | Y |

{a_{5},a_{6},a_{7}} |
{a_{1},a_{2},a_{3},a_{4},a_{8},a_{9},a_{10}} |
0.6 | N |

Notes: ^{a} whether to delete the new attribute in the subset of deleted attributes, Y (Yes), N (No).

From _{5}, a_{6}, a_{7}, a_{10}} is one of the minimum subsets, which will not change the objects classification of the original attributes. The subset of {a_{2}, a_{3}, a_{4}, a_{5}, a_{6}, a_{7}, a_{8}, a_{9}, a_{10}} is utilized to show the process of attribute reduction. The attribute a_{1} is not included in the subset. The fuzzy similarity class of all attributes is shown as X:

X = {{x_{1},x_{2},x_{3}},{x_{1},x_{3},x_{5},x_{10}},{x_{2},x_{3},x_{4}},{x_{3},x_{4},x_{10}},{x_{4},x_{8},x_{10}},{x_{5},x_{6},x_{10}},{x_{6},x_{8},x_{10}},{x_{7},x_{8},x_{10}},{x_{9}}}

Considering the subset {a_{2},a_{3},a_{4},a_{5},a_{6},a_{7},a_{8},a_{9},a_{10}}, fuzzy similarity class can be obtained as FR(a_{1}):

FR(a_{1}) = {{x_{1},x_{2},x_{3}},{x_{1},x_{3},x_{5}},{x_{1},x_{5},x_{10}},{x_{3},x_{4}},{x_{4},x_{10}},{x_{5},x_{6},x_{10}},{x_{7},x_{8},x_{10}},{x_{9}}}

The β-approximate classification quality of the subset equals to 1, which means a_{1} can be deleted without affecting objects classifications.

By the same method, the subsets of {a_{3}, a_{4}, a_{5}, a_{6}, a_{7}, a_{8}, a_{9}, a_{10}}, {a_{4}, a_{5}, a_{6}, a_{7}, a_{8}, a_{9}, a_{10}}, {a_{5}, a_{6}, a_{7}, a_{8}, a_{9}, a_{10}}, and {a_{6}, a_{7}, a_{8}, a_{9}, a_{10}} are calculated. It is found that the β-approximate classification quality of the subset {a_{6}, a_{7}, a_{8}, a_{9}, a_{10}} is not equal to 1. This indicates that the attribute a_{5} cannot be deleted.

Finally, one reduct {a_{5}, a_{6}, a_{7}, a_{10}} (Reduct A) can be obtained. There is always more than one reduct in RS attribute reduction. Because DO is taken as an important parameter to assess water quality, another reduct {a_{2}, a_{5}, a_{6}, a_{7}, a_{8}, a_{9}, a_{10}} (Reduct B) is gained to compare with Reduct A.

Because the value α in fuzzy similarity relation is set by subjective experience, different α values are assigned to obtain other reducts to discuss the effect of the value α. The reducts {a_{4}, a_{6}, a_{7}, a_{8}} (Reduct C), {a_{3}, a_{6}, a_{7}, a_{8}, a_{9}, a_{10}} (Reduct D), {a_{4}, a_{5}, a_{6}, a_{7}, a_{9}} (Reduct E), and {a_{4}, a_{5}, a_{6}, a_{7}} (Reduct F) are obtained when α is set to be 0.29, 0.28, 0.27, and 0.26/0.25, respectively. The same reduct (Reduct F) can be obtained when α is 0.26 and 0.25.

Using the calculation method in Equation (11), the information entropy of the four parameters can be obtained. Then according to Equation (12), each parameter gets a weight. The information entropy and weight of each parameter are revealed in

Weights of parameters calculated by entropy method.

Parameters | Information Entropy | Weight |
---|---|---|

BOD_{5} |
0.8617 | 0.3701 |

NH_{3}-N |
0.8579 | 0.3802 |

TP | 0.9528 | 0.1263 |

F. |
0.9539 | 0.1234 |

After calculating the entropy weights of the four parameters after FRS attribute reduction, ARTM is applied to assess water quality in the Harbin reach of the Songhuajiang River and the results of Reduct A are shown as Reduct A in _{5}, NH_{3}-N, TP and F. _{5}, NH_{3}-N, TP, TN, F, and F.

Assessment results of the Harbin reach of the Songhuajiang River.

Methods | Reducts | Jan. | Feb. | Mar. | Apr. | May | Jun. | Jul. | Aug. | Sep. | Oct. |
---|---|---|---|---|---|---|---|---|---|---|---|

With attribute reduction | Reduct A | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅱ | Ⅱ | Ⅱ | Ⅳ | Ⅱ |

Reduct B | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅱ | Ⅱ | Ⅱ | Ⅳ | Ⅱ | |

Reduct C | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅱ | Ⅱ | Ⅲ | Ⅲ | Ⅳ | Ⅱ | |

Reduct D | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅱ | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅲ | |

Reduct E | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅱ | Ⅱ | Ⅱ | Ⅲ | Ⅱ | |

Reduct F | Ⅲ | Ⅱ | Ⅲ | Ⅲ | Ⅲ | Ⅱ | Ⅱ | Ⅱ | Ⅳ | Ⅱ | |

Without attribute reduction | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅲ | Ⅱ | Ⅲ | Ⅲ | Ⅲ | Ⅱ |

The results with attribute reduction (Reducts A–F) are not exactly the same as those without attribute reduction. There are three objects in Reduct A, Reduct B, and Reduct D, two objects in Reduct C and Reduct E, and four objects in Reduct F, whose ranks are different from those without attribute reduction. The differences can be attributed to the selection of the value

The results of Reduct A and Reduct B are exactly the same. Reduct A includes the parameters of BOD_{5}, NH_{3}-N, TP, and F. _{5}, NH_{3}-N, TP, TN, F, and F.

In this study, a fuzzy set was combined with a rough set to perform attribute reduction of water quality parameters, because of the limitations of the pure rough set. An entropy method was used to calculate the parameter weights. The attribute recognition theoretical model was successfully applied to evaluate water quality rankings for the period from January to October in 2012 for the Harbin reach of the Songhuajiang River in China. The results indicate that water quality in study area is acceptable. Nevertheless, special attention should be paid to prevent further water pollution. For example, TN is the major pollutant factor in the study area, and TN concentrations in ten months exceeded the permissible limit (Rank III), with one month beyond Rank V. A fuzzy rough set was employed to handle the water quality data to perform attribute reduction. After attribute reduction, the assessment results are almost the same as those from before attribute reduction. This shows that that fuzzy rough set theory is a reasonable and reliable way to perform attribute reduction. Especially for datasets with a large number of parameters and small objects, the fuzzy rough set can obviously reduce input space dimensions and computation complexity. However, there are still some objects with attribute reduction showing different results from those without attribute reduction, which perhaps can be attributed to the value α decided by subjective experience. The assessment results of five reducts (Reduct A, Reduct C, Reduct D, Reduct E, and Reduct F) are somewhat different from those without attribute reduction. The differences can be accepted. Determining how to select the value α to get reducts is very important in this paper, and it will be discussed in our future study. Although the assessment results with attribute reduction are not perfect now and still need improvement, the fuzzy rough set can still be regarded as a useful tool to perform attribute reduction to reduce input space dimensions.

This work was supported by the National Natural Science Foundation of China (No. 51178018 and No. 71031001). The authors would like to thank anonymous referees for their useful comments and valuable suggestions to improve the content and composition substantially.

Work presented here was conceived of, carried out and analyzed by Zhihong Zou, Yan An and Ranran Li.

The authors declare no conflict of interest.