A New Comprehensive Evaluation Method for Water Quality: Improved Fuzzy Support Vector Machine

: With the pressure of population growth and environmental pollution, it is particularly important to develop and utilize water resources more rationally, safely, and efﬁciently. Due to safety concerns, the government today adopts a pessimistic method, single factor assessment, for the evaluation of domestic water quality. At the same time, however, it is impossible to grasp the timely comprehensive pollution status of each area, so effective measures cannot be taken in time to reverse or at least alleviate its deterioration. Thus, the main propose of this paper is to establish a comprehensive evaluation model of water quality, which can provide the managers with timely information of water pollution in various regions. After considering various evaluation methods, this paper ﬁnally decided to use the fuzzy support vector machine method (FSVM) to establish the model that is mentioned above. The FSVM method is formed by applying the membership function to the support vector machine. However, the existing membership functions have some shortcomings, so after some improvements in these functions, a new membership function is ﬁnally formed. The model is then tested on the artiﬁcial data, UCI dataset, and water quality evaluation historical data. The results show that the improvement is meaningful, the improved fuzzy support vector machine has good performance, and it can deal with noise and outliers well. Thus, the model can completely solve the problem of comprehensive evaluation of water quality.


Introduction
Water is an essential resource for human survival and development, and it has become more and more important because of the growth of population and the deterioration of environmental pollution [1][2][3][4][5].Therefore, a model that can distinguish water quality is critical.It helps managers to rationally allocate and utilize water resources, and brings a more comprehensive understanding of water pollution in various areas at the same time.
Existing water quality assessment methods can be broadly divided into three categories: traditional assessment method, evaluation method based on fuzzy mathematics and machine learning method.For the first one, traditional water quality assessment methods such as the single factor assessment, grading score method, function evaluation method, etc. used a series of calculations to obtain a comprehensive score to evaluate water quality [6].However, due to the non-determinism and non-linearity of water pollution, traditional methods cannot accurately describe this complex pollution process.For the second one, since the classification boundaries and pollution levels are both fuzzy phenomena, fuzzy mathematics was applied to water quality assessment.Among them, the fuzzy comprehensive evaluation [7][8][9] and the gray clustering method [10,11] were widely used.Besides, Yun and Zou combined these two methods for water quality evaluation [12].However, on the one hand, it takes time and effort for experts to give scores, and on the other hand, the choice of whitening function varies from person to person, so these methods that are based on fuzzy mathematics are difficult to be used widely.For the third one, because today's machine learning methods can solve nonlinear problems well, these algorithms have been widely used in the field of water quality evaluation.In terms of neural networks, it has been widely used in the evaluation of water quality [13][14][15][16].However, the neural network algorithm requires a significant amount of samples for training.In contrast, support vector machine (SVM) has good generalization ability and unique advantage in solving the classification problem of nonlinear high-dimensional modes in the case of a small number of samples [17].
Vladimir Vapnik originally proposed the SVM for the two-class classification problem [18].It has been applied to classification and prediction problems in various fields such as medicine, engineering, and education.However, as the SVM is applied to many fields, the performance of standard SVM method is gradually insufficient to meet our needs.Therefore, many researchers have optimized it to further improve the performance of this evaluation model.In terms of parameter optimization, ant colony algorithm [19], genetic algorithm [20], and particle swarm optimization algorithm [21], etc. were used to help the SVM search for the optimal parameters g and c faster.In addition, in terms of model building, Fei and Liu provided a new binary tree-based SVM algorithm, which can improve classification efficiency [22].Liu et al. proposed an efficient self-adaption instance selection algorithm to reconstruct the training set of support vector machine [23].Li et al. used Adaboost to improve SVM [24].Suykens et al. proposed a least squares support vector machine, which can reduce the computational complexity [25].
Therefore, SVM is widely used in water quality evaluation, and its performance is constantly being improved.Zhou et al. used SVM to evaluate the water quality data of Wei River, and proposed a self-adaptive parameter optimization method using float genetic algorithm [26].Chen et al. used SVM to evaluate the groundwater quality of Yangmaowan Irrigated Area [27].Dai also used intelligent genetic algorithm to select the parameters of the least square support vector machine (LS-SVM), and then used this model to classify and predict the water quality of the Changjiang River [28].
However, this paper finds that the problem of noise points that may be formed during water quality evaluation has not been solved.Fortunately, the fuzzy support vector machine method (FSVM) was proposed.It can solve the problem of noise and outliers by applying the concept of membership function in support vector machine [29].Applying it to water quality assessment can solve this problem.
Besides, the core of FSVM, the membership function, is also constantly being improved to better identify noise.Ren constructed a new membership function through the geometric mean of two membership functions, and found that this method can improve the classification performance [30].Wu defined an adjustment factor by classifying hyperplanes and classification intervals [31].Xu proposed two definition methods of membership function, one was to use two intra-surface hyperplanes to define membership degree, and the other was to consider both the distance-based membership function and the new compactness-based membership function [32].However, after analyzing several membership functions in the above existing articles, this paper finds that these functions have some problems in some cases.
Therefore, the main purposes of this study are (1) to form a new membership function by improving the above defects; (2) to construct the FSVM water quality evaluation model that is based on this membership function; and, (3) to verify the performance of the above model and apply it to the real cases.This study aims to establish a reasonable and efficient comprehensive model of water quality assessment, by which managers can understand the timely water pollution information of each region.

Data Imbalance
For the water quality assessment, in order to strengthen the management of surface water environment, prevent water pollution, and protect human health, the "Surface Water Environmental Quality Standard" (GB3838-2002) promulgated by the State Environmental Protection Administration of China is used as the national quality standard for water quality assessment in China.According to it, the water quality is divided into five categories from I to V. But, in reality, some indicator values of a small amount of areas exceed the value of this evaluation table, which is evaluated as inferior V. Obviously, the data of the inferior V class is less than the other classes.As a result, the dataset is not balanced.
Therefore, this problem needs to be solved by oversampling or undersampling method.In 2002, Chawla et al. proposed the synthetic minority over-sampling technique method (SMOTE), which solves the data imbalance problem by randomly selecting a point from the k neighbors of one sample in the minority class and generating a new sample between the original sample and this selected sample [33].However, this method may also cause additional noise, so the modified approach (MSMOTE) was proposed by Hu et al. in 2009 [34].It first divides the minority class samples into safety samples, boundary samples and potential noise, and then oversamples the safety samples.In general, MSMOTE can be used to solve the data imbalance problem well, but sometimes there are several minority classes at the same time, then the problem can only be solved by the undersampling method.
Of course, after data balancing, the accuracy may not be improved, sometimes it would even be slightly reduced on the contrary.For the training data of water quality assessment, it is very likely that the amount of data in the inferior V is very small.However, there are many data belonging to class II or class III.After all, the excellent water resources will only be a minority, as well as the severely polluted area.Then, if the classifier for the class II and inferior class V judges that all the samples belong to the former, the accuracy rate can be very high, but, in fact, the popularity of this model is very poor, only in its own testing set has a high accuracy.Therefore, once the above methods are used to balance the dataset, it is likely to lead to a decrease in classification accuracy, but, in fact, the promotion degree of the model will be increased.

Data Normalization
The water quality data has different dimensions for different indicators.For example, the pH value generally belongs to the range of 6-9, while the ammonia nitrogen value is mostly between 0 and 1.So, the data needs to be normalized first.In this paper, the mapminmax function in Matlab is used for normalization, and all data is normalized to (0,1).The formula that is used by mapminmax is shown in Equation (1).
where the value of max x i and min x i need to be set.Here as the range of normalization is (0, 1), we have max x i = 1, min x i = 0.

Basic Model Selection
According to the "Surface Water Environmental Quality Standards" (GB3838-2002), the classification criteria of three indicators are shown in Table 1.Now, suppose that the situation is simplified into a two-dimensional plane, then the location of samples belonging to class I and II is shown in Figure 1.The points in lower left and upper right areas are unquestionable standard water quality, but in reality, there must be sample points in the lower right and upper left areas of the figure.For these points, the value of one indicator satisfies the Class I standard, and the value of the other indicator is not satisfied.This situation cannot be evaluated only by the water quality standard.Therefore, it is necessary for the experts to evaluate them according to the actual situation.Because of the opinion differences, noise that may interfere with the classification model will be generated.Therefore, the water quality evaluation model of this paper will be constituted by the FSVM method to reduce noise.indicator is not satisfied.This situation cannot be evaluated only by the water quality standard.Therefore, it is necessary for the experts to evaluate them according to the actual situation.Because of the opinion differences, noise that may interfere with the classification model will be generated.Therefore, the water quality evaluation model of this paper will be constituted by the FSVM method to reduce noise.Why can FSVM remove the effects of noise points?In the standard SVM model, the classification problem may be overfitting at the beginning, that is, the model requires a perfect separation of the two types of samples, which would result in the situation of Figure 2b instead of Figure 2a.Therefore, it is necessary to relax the original condition of  (  + ) ≥ 1 by the slack variable, that is, the value of  (  + ) of some points can be allowed to be less than 1, but this relaxation still requires a certain limit.Therefore, the sum of all the slack variables  needs to be kept to a minimum.At the same time, the penalty parameter C is applied to control this degree of rigor.So, the Equation ( 2  Why can FSVM remove the effects of noise points?In the standard SVM model, the classification problem may be overfitting at the beginning, that is, the model requires a perfect separation of the two types of samples, which would result in the situation of Figure 2b instead of Figure 2a.Therefore, it is necessary to relax the original condition of y i w T z i + b ≥ 1 by the slack variable, that is, the value of y i w T z i + b of some points can be allowed to be less than 1, but this relaxation still requires a certain limit.Therefore, the sum of all the slack variables ε i needs to be kept to a minimum.At the same time, the penalty parameter C is applied to control this degree of rigor.So, the Equation ( 2) is formed.min w,b,ε  But, since C is a constant, the SVM model gives the same degree of punishment to each sample after softening the boundary.Therefore, the FSVM specifically gives different importance to different samples to solve this problem, that is, transform the sample set U by the membership degree Si to U′ = {(x , y , S ), i = 1,2, … , n}, where x ∈  , y ∈ {+1, −1}, S ϵ(0,1).Thus, the previous SVM model is changed into Equation (3).

min (
, , Convert the above problem into Equation (4) while using the Lagrangian function: Then, convert it to dual problem: Since =  •  −  −  = 0, the original form becomes: Since = 0 and = 0, the above formula can be reduced to a problem only containing the unknown number  : But, since C is a constant, the SVM model gives the same degree of punishment to each sample after softening the boundary.Therefore, the FSVM specifically gives different importance to different samples to solve this problem, that is, transform the sample set U by the membership degree S i to U = {(x i , y i , S i ), i = 1, 2, . . ., n}, where x i ∈ R m , y i ∈ {+1, −1}, S i ∈ (0, 1).Thus, the previous SVM model is changed into Equation (3).min ( w,b,ε Convert the above problem into Equation (4) while using the Lagrangian function: Then, convert it to dual problem: Since ∂L ∂ε = S i •C − α i − β i = 0, the original form becomes: Since ∂L ∂b = 0 and ∂L ∂w i = 0, the above formula can be reduced to a problem only containing the unknown number α i : It can be seen that since the C value is the same for all the sample points, the points whose S i is larger are less likely to be misclassified, and the points whose S i is smaller have less effect on the formation of the optimal hyperplane.Therefore, by assigning different S i values to different sample points, the influence of sample noise can be reduced.

Parameter Optimization
The penalty parameter C indicates the degree of punishment for the misclassified samples, so the larger the C is, the smaller the error would be.However, if C is too large, this will lead to overfitting, so a suitable parameter C is needed.In this paper, the RBF kernel function is selected, so the size of the parameter g will affect the complexity of the optimal classification plane.In terms of parameter optimization, this paper adopts the grid optimization method.Each time that a set of (C, g) values are determined to be run in the model, and finally, all of the areas in the grid are traversed to obtain the best C and g.

Cross-Validation
For a classification model, after training with historical data, samples of unknown results can be put into the model to obtain prediction results.But, there is no way to verify the performance of this classifier itself.Thus, this sample set of known results is generally divided into two parts, one part is still used to train the model and the other part is used to test the performance of the trained model [35].
The commonly used cross-validation methods are as follows: (1) K-cv: The data is divided into k groups, then each time one of them is selected for testing, and the remaining (k − 1) groups are used to train the model.This process will be repeated k times.Finally, the evaluation result is generated by taking the average of all the results.(2) Loo-cv: Suppose there are n samples, each time (n − 1) samples are used for training, and the remaining one is used as a test set.The above process is repeated n times, and the final result is the average of all values.Since almost every sample is used for training each time, this method leaves almost no information and the results are more reliable.However, because it is repeated too many times, the method is time consuming.

Multi-Classification Model
Since the water quality data is divided into six categories, the structure of model used in this paper is as shown in Figure 3. First, step (1) is performed, that is, each of the two types of data in the training set is separately put into the classifiers 1-15 for training.Then, step (2) is performed, that is, a new sample of an unknown result is separately put into 15 classifiers for evaluation.Finally, step (3) is to summarize the classification result of all classifiers to obtain the final prediction result.
Step (1): The two-class classification model uses the FSVM method, and the training dataset is put into the model to calculate the values of w and b.
Step (2): For the sample x with the unknown result, since w = ∑ n i=1 α i y i z i , so y i , α i , x i , and b are all known variables.Therefore, the classification function takes the form of the Equation ( 8), in which the kernel function uses the Gaussian kernel function K(x i , x j ) = exp (− ).
Step (3): This paper adopts the one-to-one method [36].Whenever a new project enters the model, the classifiers 1-15 will respectively give a result.In the end, the category that receives the most votes will be the result of the sample evaluation.
The above is a complete classification model for water quality assessment, but the optimization problem in step (1) still needs to be solved by complicated methods.In 1998, the SMO algorithm that was invented by John Plett solved this problem [37].This article uses the SMO algorithm to solve the SVM.
In order to solve the optimization problem about (α 1 , . . ., α n ) described by Equation ( 7), SMO decomposed it into several sub-problems that solve only two parameters each time.The outer loop part traverses the entire set, finds α i that violates the KKT condition as the first point α 1 , and then performs an inner loop to find the point α 2 that can maximize |E 1 − E 2 |.Then, update some parameters, find the value of b, and finally determine whether the stop condition is met.If it is not satisfied, continue the loop.
Water 2018, 10, x FOR PEER REVIEW 7 of 24 was invented by John Plett solved this problem [37].This article uses the SMO algorithm to solve the SVM.
In order to solve the optimization problem about ( , … ,  ) described by Equation ( 7), SMO decomposed it into several sub-problems that solve only two parameters each time.The outer loop part traverses the entire set, finds  that violates the KKT condition as the first point  , and then performs an inner loop to find the point  that can maximize | −  | .Then, update some parameters, find the value of b, and finally determine whether the stop condition is met.If it is not satisfied, continue the loop.

Improved Membership Function
At present, the basic framework of the water quality evaluation model is already available, but the membership function as the core of the FSVM model has not yet been determined.Because some existing membership functions still have certain defects, this part mainly analyzes and improves the problems of them, and it finally proposes the improved membership function of this paper.

Membership Function
The membership function itself is a mapping between a sample set and the range (0, 1), which indicates the extent to which a sample belongs to a certain situation.Here are a few common membership functions: (1) Trigonometric membership function

Improved Membership Function
At present, the basic framework of the water quality evaluation model is already available, but the membership function as the core of the FSVM model has not yet been determined.Because some existing membership functions still have certain defects, this part mainly analyzes and improves the problems of them, and it finally proposes the improved membership function of this paper.

Membership Function
The membership function itself is a mapping between a sample set and the range (0, 1), which indicates the extent to which a sample belongs to a certain situation.Here are a few common membership functions: (1) Trigonometric membership function Water 2018, 10, 1303 8 of 23 (2) Trapezoidal membership function (3) Distance-based membership function in FSVM and x − are the centers of the positive and negative samples, respectively.

Basic Form of Membership Function
At present, there are a lot of literatures that have put forward their own ideas on the issue of membership function.In [30], the author used the positive and negative class radius (r + , r − ), the distance from the point to the center of the class (d i+ , d i− ), and the distance from the center of the positive class to the center of the negative class (d +− ) to construct the membership function µ i , as follows: In [32], the author also proposed two new membership functions.One is a combination of the traditional membership function that is based on distance and compactness.The membership function based on distance is the same as Equation ( 14), and the membership function based on compactness selects the nearest p points around the sample point x i .When all of the p samples do not belong to the class of x i , x i is judged as noise, which has no effect on the formation of classification plane, so the value of e i is δ.When all of the p samples belong to the class of x i , the value given to e i is (∑ )/p.When q of the p samples belong to the class of x i , and the rest do not belong to this class, the The other one is based on the intra-class hyperplane as defined.It only considers the points inside two intra-class hyperplanes.The membership of the outer points is directly defined as a very small positive number δ.According to the distance d i+ or d i− from the point to the hyperplane of this class and the distance D between two hyperplanes, the membership of the inner points is defined, as follows.
Water 2018, 10, 1303 In addition to satisfying the requirements of the basic value range (0, 1), the membership function that is required by FSVM is mainly to reflect the requirements of noise reduction and to facilitate the formation of classification planes.Many articles, including [30,31], construct a hypersphere based on the class center, and design a membership function based on the distance from the sample point to the sample center.To some extent, this method relies heavily on the geometry of the sample distribution.For example, in the case of Figure 4  In addition to satisfying the requirements of the basic value range (0, 1), the membership function that is required by FSVM is mainly to reflect the requirements of noise reduction and to facilitate the formation of classification planes.Many articles, including [30,31], construct a hypersphere based on the class center, and design a membership function based on the distance from the sample point to the sample center.To some extent, this method relies heavily on the geometry of the sample distribution.For example, in the case of Figure 4 below, the two points A and B contribute the same to the construction of the classification plane, but due to the different distance between the two points and their class centers, the values of membership, as calculated by the above method, based on class center are different.Thus, the membership function considers only the set U , which includes the sample points inside the two hyperplanes I + and I − .The sample points outside the hyperplanes are no longer considered, because they do not help in the determination of the optimal hyperplane.The above content is described in Equation (18).
where δ is a very small positive number.
(2) The problem of distance-based membership function The distance-based membership functions that are designed in [30,32] are inversely proportional to the distance d i , that is, the closer to the center of the class, the larger the value.This design is mainly based on the idea that the closer the point is to the class center, the more it should belong to this class.However, the point outside the boundary of the classification plane satisfies the condition w T x + b > 1 , and its relaxation variable ε i = 0, so the value of C does not affect the classification result.Conversely, such a method makes the S i of the useless point that is closer to the center of the class bigger than the S i of the point that may be the support vector.Therefore, this paper adopts the idea of membership function that is based on intra-class hyperplane in [32], and it gives the larger function value to the samples that are closer to the boundary zones between two types.
(3) The problem of compactness-based membership function In [30], the author used a parameter λ ∈ (0, 1) to solve the problem of noise reduction.When the value of d i is bigger than the product of λ and the distance D between two hyperplanes, S i = δ (δ is a small positive number).However, if the situation in Figure 6 below occurs, that is, the final classification plane is not parallel to the two hyperplanes.Here, d A or d B is the distance from the point A or B to the intra-class hyperplane.Assuming that the final λ can make d A > λ•D, the noise A can be successfully excluded, but the B point satisfying d A = d B should be a support vector.However, the condition of d B > λ•D is also satisfied, so B is also treated as noise.Therefore, there may be some problems with this method.In [32], the author constructed a membership function based on compactness.When q samples of the nearest p neighbors of the sample  are not in the same class as  , the centripetal degree is defined as the former one.However, as shown in Figure 7, if p = 5, and the five white points that are shown in Figure 7e,f are the nearest five neighbors of point A and B, the point A in Figure 7a has a certain effect on the formation of the classification plane, but the point B in Figure 7b is obviously a noise point.But, according to the method of [32], A and B have the same membership.In [32], the author constructed a membership function based on compactness.When q samples of the nearest p neighbors of the sample x i are not in the same class as x i , the centripetal degree is defined as the former one.However, as shown in Figure 7, if p = 5, and the five white points that are shown in Figure 7e,f are the nearest five neighbors of point A and B, the point A in Figure 7a has a certain effect on the formation of the classification plane, but the point B in Figure 7b is obviously a noise point.But, according to the method of [32], A and B have the same membership.
In [32], the author constructed a membership function based on compactness.When q samples of the nearest p neighbors of the sample  are not in the same class as  , the centripetal degree is defined as the former one.However, as shown in Figure 7, if p = 5, and the five white points that are shown in Figure 7e,f are the nearest five neighbors of point A and B, the point A in Figure 7a has a certain effect on the formation of the classification plane, but the point B in Figure 7b is obviously a noise point.But, according to the method of [32], A and B have the same membership.

Improvement Ideas
For the case where the neighbors of a sample point are all of the same class of this point or the neighbors of a sample point are all the points of the different class, it can be directly determined.However, there are two main situations for areas where the positive and negative sample points are mixed together.One is that the points are close to the junction area of two types, so most of the samples in this area should be given correspondingly large values.The other is that the point is a noise point inside a certain class, but the categories of its neighbors are not all different from it, so this situation is different from the case, where all of the surrounding points are of different class.
At the beginning, this paper uses the category of the nearest neighbor as the criterion for evaluating whether the point in the mixed area is noise.If the class of the sample point closest to it is the same as its class, it is judged to be a useful point.If the class of the sample point closest to it is different from its class, the point is determined to be noise, namely, (1) When all of the p sample points are not in the class of  ,  is noise, and has no effect on the classification plane formation, so the value of  is .

Improvement Ideas
For the case where the neighbors of a sample point are all of the same class of this point or the neighbors of a sample point are all the points of the different class, it can be directly determined.However, there are two main situations for areas where the positive and negative sample points are mixed together.One is that the points are close to the junction area of two types, so most of the samples in this area should be given correspondingly large values.The other is that the point is a noise point inside a certain class, but the categories of its neighbors are not all different from it, so this situation is different from the case, where all of the surrounding points are of different class.
At the beginning, this paper uses the category of the nearest neighbor as the criterion for evaluating whether the point in the mixed area is noise.If the class of the sample point closest to it is the same as its class, it is judged to be a useful point.If the class of the sample point closest to it is different from its class, the point is determined to be noise, namely, (1) When all of the p sample points are not in the class of x i , x i is noise, and has no effect on the classification plane formation, so the value of c i is δ.(2) When all the p sample points are of the same class as x i , the function is designed according to the degree of compactness of the points around x i , so the value of c i is ∑ p j=1 (3) When only q samples of the p points belong to the same class of x i , c i takes the value δ or ∑ q j=1 according to the class of its nearest neighbor.
But, in fact, the counterexample, like Figure 8, can be given.If p = 5, what appears under normal conditions should be similar to the case of Figure 8a,c.Since they are close to the classification plane, the two points A and C do not belong to the cases (1) or (2).Then, according to the class of the nearest neighbor, this method determines that point A is noise, and point C is useful for determining the classification plane.
However, it is inaccurate to base only on whether x i 's nearest neighbor is of the same or different class as x i .The reason is that although the model in this article can deal with those noise points, they will also interfere with the judgment of the surrounding points.For example, there is a nearest different-class neighbor next to point B. Point B will be judged as noise by this method, but the actual situation is not the same.Similarly, when the two noise points are very close, this method will instead judge point D as the point that contributes to the classification plane.In both cases, other noise points interfere with the determination of the adjacent sample points.
Thus, the situation similar to that in Figure 8b is first solved here.Although some of the noise isolated from the different-class points is removed by the previous case (1), a different-class point itself still interferes with the determination of the surrounding points.Therefore, this paper decides to treat the judgment of the previous situation (1) as a priori.After finding out the noise points whose neighbors are all the different-class points, the discrimination of case (2) and case (3) no longer consider these noise points in the situation (1).
will also interfere with the judgment of the surrounding points.For example, there is a nearest different-class neighbor next to point B. Point B will be judged as noise by this method, but the actual situation is not the same.Similarly, when the two noise points are very close, this method will instead judge point D as the point that contributes to the classification plane.In both cases, other noise points interfere with the determination of the adjacent sample points.
Thus, the situation similar to that in Figure 8b is first solved here.Although some of the noise isolated from the different-class points is removed by the previous case (1), a different-class point itself still interferes with the determination of the surrounding points.Therefore, this paper decides to treat the judgment of the previous situation (1) as a priori.After finding out the noise points whose neighbors are all the different-class points, the discrimination of case (2) and case (3) no longer consider these noise points in the situation (1).In addition, in order to solve the situation like Figure 8d, we must consider the second influencing factor-the number of same-class and different-class points.When a different-class point is the closest neighbor, and the number of surrounding different-class points is more than that of the same-class points, this point is judged as noise (such as point A).When a different-class point is the closest neighbor, but the number of same-class points is more than that of different-class points, the point is judged to be a useful point (such as point B).When a same-class point is the closest neighbor, and the number of same-class points is more than that of different-class points, the point is judged to be a useful point (such as point C).When a same-class point is the closest neighbor, but the number of different-class points is more than that of similar points, the point is judged as noise (such as point D).
However, it was found that in the condition from A to D discussed above, when the number of surrounding different-class points is more than the number of same-class points, the point is judged as noise regardless of the category of its closest neighbor.Thus, the result of this judgment is exactly the same as the method that considers only the number of same-class points and different-class In addition, in order to solve the situation like Figure 8d, we must consider the second influencing factor-the number of same-class and different-class points.When a different-class point is the closest neighbor, and the number of surrounding different-class points is more than that of the same-class points, this point is judged as noise (such as point A).When a different-class point is the closest neighbor, but the number of same-class points is more than that of different-class points, the point is judged to be a useful point (such as point B).When a same-class point is the closest neighbor, and the number of same-class points is more than that of different-class points, the point is judged to be a useful point (such as point C).When a same-class point is the closest neighbor, but the number of different-class points is more than that of similar points, the point is judged as noise (such as point D).
However, it was found that in the condition from A to D discussed above, when the number of surrounding different-class points is more than the number of same-class points, the point is judged as noise regardless of the category of its closest neighbor.Thus, the result of this judgment is exactly the same as the method that considers only the number of same-class points and different-class points.The category of the nearest neighbor has no meaning.Therefore, it can be determined as directly based on the number of neighbors in the different class and in the same class.
If a counterexample is also given to this classification method, it should be similar to the case of Figure 9b,c under normal circumstances.That is, when there are more neighbors of the same class of this point, it should be a normal point.Similarly, when the number of neighbors that belong to the different class of the point is bigger, it should be the noise.
Water 2018, 10, x FOR PEER REVIEW 13 of 24 points.The category of the nearest neighbor has no meaning.Therefore, it can be determined as directly based on the number of neighbors in the different class and in the same class.If a counterexample is also given to this classification method, it should be similar to the case of Figure 9b,c under normal circumstances.That is, when there are more neighbors of the same class of this point, it should be a normal point.Similarly, when the number of neighbors that belong to the different class of the point is bigger, it should be the noise.However, for example, the number of different-class points in the neighbors of point A in Figure 9a is more than that of the same type, but it is not a noise point.Similarly, the number of same-class points around point D in Figure 9d is more than that of the different class, but it is noise.In fact, this kind of idea that is similar to k-nearest neighbor is acceptable.The most important goal of the function However, for example, the number of different-class points in the neighbors of point A in Figure 9a is more than that of the same type, but it is not a noise point.Similarly, the number of same-class points around point D in Figure 9d is more than that of the different class, but it is noise.In fact, this kind of idea that is similar to k-nearest neighbor is acceptable.The most important goal of the function S i2 is to find the noise point, so the method is acceptable that identifies the point, the neighbors of which have more points of the different class, as noise.But, this judgment will be limited by the value of the parameter p and the local phenomenon interference.
Therefore, another factor to be considered is added here, which is the number of the points belonging to the same class as the point x i among the three neighbors other than the p nearest neighbors.This factor can avoid the interference of local phenomena on judgment to a certain extent.
In this case, when the number of same-class points in p neighbors around a point is less than the different-class points, this point should be judged as noise.But, if two or three of the three nearest neighbors, except the p points are belong to the same category, it proves that the above situation is only a partial phenomenon, so the sample point is judged to be a normal point (such as point A).If only one or none of the three neighbors except the p points is in the same class, then the sample point is judged to be noise (such as point B).When there are more same-class neighbors of the p point, if only one or none of the three neighbors except the p points is in the same class, the sample point is judged to be noise (such as point D).If two or three of the three nearest neighbors except the p points belong to the same category, the sample point is judged to be a normal point (such as point C).The Equation (19) shows this function: where if there are two or three points in the three neighbors belonging to the same class as the point x i , t i = 1.If there are only one or none of the three points in the three neighbors belonging to the same class as the point x i , t i = −1.
However, in fact, when t i = 1, regardless of whether the value of (q − p 2 ) is positive or negative, the value of c i is δ.Similarly, when , regardless of the value of (q − p 2 ).Therefore, the value of the function can be only determined by the positive and negative of t i , namely, However, the discriminant of Equation ( 20), the condition of the three neighbors, except the p points, is actually based on the number of same-class and different-class points around it in essence.The difference between the two judgement conditions is only the value of p.So the addition of three points is actually meaningless.That is, the problem can be solved by using only the idea of the number of same-class and different-class points around the point.

Improved Membership Function
Therefore, based on the above analysis, the membership function of the paper finally combines both the distance and the compactness.The following Equation ( 21) is the distance-based membership function of this article that is designed for easy classification: where t− and t+ are the total number of positive and negative sample points inside the two hyperplanes, respectively, and δ is a small positive number.
When a point is very close to the intra-class hyperplane, it has no effect on the construction of the classification plane, so its membership degree is infinitely close to zero.As the sample point gets closer to the junction zone of the two types of samples, its contribution to the construction of the classification plane is greater.Therefore, its membership is also greater.
However, only one function in the model is obviously insufficient.For example, when a point satisfying the condition y i > 0 is far away from the positive class hyperplane, it is mixed into the negative class sample points.Obviously, it should be a noise point, but the value of the membership function S 1 above will be large.Therefore, it is necessary to adjust the above function.To this end, this paper designs another membership function to solve the problem of noise and isolated points.
The following is the improved compactness-based membership function that is proposed in this paper.Here, distance from the nearest p sample points around the sample point , where p is an odd number.
(1) When all the p sample points are not in the same class as x i , x i is judged as noise and it has no effect on the classification plane formation, namely, (2) Reselect p sample points around the sample point x i that are closest to it and do not contain the points of the above case (1).This can effectively avoid the interference of a single noise point in some cases (1) to the judgment of surrounding sample points.
a. When all of the p sample points at this time are not in the same class as x i , x i is judged as noise and it has no effect on the classification plane formation, namely, b.When all of the p sample points are of the same class as x i , the function is designed according to the compactness of the sample points around x i , that is, the tighter the sample point, the larger the c i : c.When q points in p sample points belong to the same class of x i , the remaining ones do not belong to this class, the value of c i is as follows: Finally, the formula of the second membership function S i2 is presented, as follows: Water 2018, 10, 1303 15 of 23 So far, the design of c i in the case (1) and the case a of (2) completes the aim of noise reduction.The design of the function value ∑ q j=1 1 d ij in b of the case (2) excludes the effect of isolated points.When a point is isolated, its c i will be very small.The c of the case (2) compensates for the loopholes of the previous two cases a and b.It uses the idea of k-nearest neighbor to make S i2 better distinguish the effect of the sample points in the junction area by the appropriate parameter p.At the same time, together with case (1), it also solves the problem left by S i1 , that is, by the effect of determining the noise, the function S i2 negates the membership value in S i1 , which is largely due to being far away from the hyperplane in the class.
Since S i1 only considers the importance of points near the classification plane, it has no ability to handle noise and isolated points.Therefore, this paper constructs S i2 to complete the task of removing noise and isolated points, and it makes up for the defects of S i1 .Thus, the final membership function of this paper is determined, as follows: When the compactness of the neighbors around the sample point is constant, the closer the sample point is to the junction area, the greater its membership degree.When the distance between the sample points and the hyperplane in the class is constant, the bigger the compactness of the neighbors of the sample point, the greater the membership degree.For the method of the combination of S i1 and S i2 , since the two functions can compensate each other after the addition, this article does not adopt the addition method.When considering that noise should be directly rejected, this paper uses the multiplication method that is uncompensated.Now, please have a look at the issues discussed before: (1) For the case of Figure 4, it is clear that this problem has been solved by using the method of intra-class hyperplane instead of hypersphere.The value of S i1 of the new function is the same for A and B in Figure 4, but, depending on the situation of the surrounding points of A and B, different S i2 may be given.Finally, the result is the combination of S i1 and S i2 .(2) For the case of Figure 5, the new function judges the sample point based on the number of same-class and different-class points around it, rather than the distance to the intra-class hyperplane.
(3) For the problem of Figure 6, the improved effect has been shown in the analysis of the situation in Figure 9.
Finally, since the model is to be adopted to multi-class classification, the variables are to be converted.These variables include the class centers ϕ(x + ) and ϕ(x − ), distance ||W|| between two class centers, condition that determines whether it is inside or outside the hyperplane, the distance d i+ or d i− between the point and the intra-class hyperplane and the distance d ij between the sample points.

Data Verification
In order to verify whether the improved membership function that is proposed in this paper can really achieve the expected effect, the following experiments are conducted while using the artificial dataset and the UCI standard data to verify the model performance.

Experiment Based on Artificial Data
In order to test the function's ability to recognize noise and outliers in the case of visualization, the improved membership function is first tested by an artificial data set.Randomly place 90 sample points whose horizontal and vertical coordinates are between 0 and 1 on a two-dimensional plane.Take the data whose x 1 in (0, 0.5) as the negative sample point, and the data whose x 1 in (0.5, 1) as the positive sample point.Then, 10 noise points are randomly placed into the data set.The above data was put into the improved FSVM model.Since the sample set is small, p takes 3. Since the construction of S i1 is relatively simple, the focus is on the value of the function S i2 .The above work is repeated ten times and the results are observed.The specific data distribution of a certain time is shown in Figure 10, and the final results are shown in Table 2.It can be seen that among all of the  values, 10 samples have a value of 6.17 × 10 −6 , which are all the noise points that are placed previously.Another point (0.36, 0.14) can be seen as an isolated point from the figure below, so its  value is only 0.0630.Therefore, it can be seen that the improved FSVM handles these 11 points that are not useful for classification very well.In the same way, the results of the other several experiments are also the same.It can be seen that among all of the S i2 values, 10 samples have a value of 6.17 × 10 −6 , which are all the noise points that are placed previously.Another point (0.36, 0.14) can be seen as an isolated point from the figure below, so its S i2 value is only 0.0630.Therefore, it can be seen that the improved FSVM 2016 reached 83.81 billion cubic meters, while the total amount of wastewater discharged was 17.42 billion tons.For this reason, it is of great significance to carry out a water quality assessment on the Pearl River, which is large both in water consumption and waste water discharge [38].
This paper selects the automatic monitoring data of the Pearl River Basin from 2012 to now (6, 2018), a total of 2633 records were deleted after all the null values were removed.Since the pollution of the Pearl River is mainly organic pollution, this paper selects four conventional evaluation indicators: pH value, chemical oxygen demand (CODmn), dissolved oxygen (DO), and ammonia nitrogen (NH 3 -N).The distribution of the dataset is very unbalanced as shown in Table 6, and there are many minority classes at the same time, that is, there are several classes at the same time whose number of samples is much smaller than that of the class with the most samples.Therefore, the undersampling method is performed first.The data is then put into the FSVM model based on the improved membership function of this paper.

Analysis and Comparison of Evaluation Results
According to the cross-validation method, 80% of the data is used as the training set, and 20% of the data is used as the testing set.The classification is carried out by the FSVM model of this paper, and the results that were obtained are compared with the single factor evaluation (SFE).Finally, part of the different results is shown in Table 7.The current domestic water quality assessment generally adopts the single factor evaluation method, which is to determine the over-index of each indicator, and the worst one is used as the sample's water quality evaluation result [39].The advantage of this method is that the calculation is very simple, and because it adopts the principle of pessimism, it is very suitable for the evaluation of drinking water, which attaches great importance to security issues.
However, the negative evaluation results have not allowed the government to grasp the overall situation of water pollution in a certain basin in time.The FSVM model proposed in this paper can solve this problem.The combination of this model and the single-factor evaluation can simultaneously allocate water resources reasonably and provide the comprehensive pollution status of an area.Most evaluation results of the two models are the same, and the main differences appear in the classes I, II, and III.Part of the water quality standard is shown in Table 1.It is obvious that the dissolved oxygen and chemical oxygen demand of the first and fourth samples meet the Class I standard, but the ammonia nitrogen of the two samples exceeds the Class I standard, so, according to the pessimistic principle, they should be classified as Class II.However, the comprehensive evaluation can classify them as the Class I. Similarly, the chemical oxygen demand of the No.2, No.3 and No.5 samples does not meet the Class I water quality standard, and the other indicators conform to the Class I standard, so the evaluation results are different.Both the chemical oxygen demand and ammonia nitrogen of the samples No.6, No.7, and No.9 exceeded the standard, but their dissolved oxygen was very high.Moreover, the two indicators of these points only exceeded a little, so the comprehensive evaluation result is the first category.It can be seen that it is feasible to use the improved FSVM model to evaluate water quality, and this model can provide comprehensive evaluation results well.

Analysis and Comparison of Model Performance
The previous data is also put into the standard SVM model and the two methods that are described above based on the intra-class hyperplane and the intra-class centripetal degree method.Since the single factor evaluation can be used to carefully divide water resources, the ultimate goal of the FSVM model is only to provide the manager with real-time overall information.
On the one hand, for the determination of excellent water quality, we hope that the water qualities that are judged to be excellent are really good.That is, each sample that is judged into good-quality class is deserved.Even if the standards are harsh, some of the good water resources cannot be selected.Otherwise, if some water resources that have begun to become polluted are also misjudged as good quality, the managers cannot take immediate measures.Thus, the indicator precision is needed.On the other hand, for polluted areas, we want to identify all of the heavily polluted areas, and do not want some places to "escape" the scope.Similarly, this will also cause delay in remediation.Therefore, the recognition range needs to be widened, even if some of the slightly polluted areas are included in it.Thus, it is necessary to use the indicator recall.
Therefore, we need both of them to achieve the goal of model building, but it is impossible to have the best of both indicators at the same time.If you want to identify more samples of a certain class, there will inevitably be misjudgments.Thus, the paper finally decided to adopt F1-score to consider both the model's precision and recall [40].
The performance of the model is shown in Table 8.It can be clearly seen that the improved FSVM model is better than several other models, and its F1-score is 8% higher than the standard SVM model.All of these show that the model in this paper can better evaluate the water quality comprehensively.The reason for the poor performance of the FSVM2 model here might be that it is specifically designed for gene classification problems.

Conclusions
Water is closely related to people's lives, and it is an indispensable resource.However, with the deterioration of industrial pollution, it is imperative to classify water resources in different regions in order to use water more reasonably, efficiently, and safely.For the consideration of people's safety, the current domestic evaluation method of water quality is single factor assessment.This method can eliminate the potential water misuse through negative evaluation.However, at the same time, because the result of the evaluation is too pessimistic, it will not be able to provide the current situation of water pollution in a certain area from a global perspective.Thus, timely measures cannot be implemented immediately.Therefore, a model that can comprehensively evaluate water quality is essential.
After understanding the traditional assessment methods of water quality and some evaluation methods in recent years, this paper finally decided to build a classification model that is based on the support vector machine.Firstly, the model is optimized by data preprocessing, data balance, cross-validation and parameter optimization.Then, since the samples do not fully comply with the water quality standards set by the state in many cases, noise is likely to occur in some ambiguous areas.Therefore, the standard SVM model is optimized in terms of removing the influence of noise and isolated points, so the membership function is adopted to form the FSVM model.
However, the membership function proposed in several papers has certain problems in some cases.For this reason, this paper builds a new membership function step by step.The distance-based function and the compactness-based function have been improved successively.The closer the point is to the classification plane, the larger the distance-based function.The compactness-based function first discriminates the influence of a part of the noise points by a priori, and it then determines whether a sample point is noise by the number of same-class and different-class points in the surrounding p neighbors.Finally, the two functions are combined into the new membership function.
In order to verify whether the improved membership function of this paper is reasonable in practical applications, three experiments are carried out in this paper.Firstly, an experiment is done through an artificial dataset, which is intended to observe whether the function is meaningful through the two-dimensional visual data.Then, through the high-dimensional data in the UCI database, the performance of the function is tested.However, the above two kinds of data are not the actual data of water quality assessment.Therefore, the water quality monitoring historical data is finally used for the experiment.
Finally, the result of the experiment on an artificial dataset shows that the model can deal with the negative effects of noise and isolated points.The result of the experiment on the UCI dataset shows that the model does have good performance when dealing with the high-dimensional actual data.The result of the experiment on water quality monitoring historical data shows that the improved FSVM model proposed in this paper is indeed better than the previous models to some extent.In the field of water quality assessment, it can complete the comprehensive evaluation task well, and it can provide the overall information better.

Figure 1 .
Figure 1.Part of the hypothetical water quality sample points of class I and class II.

Figure 1 .
Figure 1.Part of the hypothetical water quality sample points of class I and class II.

3 . 3 .
below, the two points A and B contribute the same to the construction of the classification plane, but due to the different distance between the two points and their class centers, the values of membership, as calculated by the above method, based on class center are different.Design of the Improved Membership Function 3.3.1.Problems with Existing Membership Functions (1) Basic architecture issues

Figure 4 . 24 Figure 5 .
Figure 4. Traditional center-based hypersphere model.Therefore, this paper decides to use the idea of intra-class hyperplane to design the membership function.As shown in Figure 5, the class centers x + = 1 n + ∑ n + i=1 x i and x − = 1 n − ∑ n − i=1 x i are first obtained, respectively.Then the two intra-class hyperplanes are constructed by the normal vector W= x + − x − .I + : W T (x − x + ) = 0 I − : W T (x − x − ) = 0 (17)Water 2018, 10, x FOR PEER REVIEW 10 of 24

Water 2018 , 24 Figure 6 .
Figure 6.Problem in the use of the parameter λ.

Figure 6 .
Figure 6.Problem in the use of the parameter λ.

Figure 7 .
Figure 7.A situation of membership function based on compactness.(a,b) Distribution of the sample points for the two cases; (c,d) Identification of neighbors by a circle; (e,f) Distribution of neighbors.

Figure 7 .
Figure 7.A situation of membership function based on compactness.(a,b) Distribution of the sample points for the two cases; (c,d) Identification of neighbors by a circle; (e,f) Distribution of neighbors.

Figure 8 .
Figure 8. Counterexample of the first attempt.The figures (a-d) are four different cases of surrounding neighbors.

Figure 8 .
Figure 8. Counterexample of the first attempt.The figures (a-d) are four different cases of surrounding neighbors.

Figure 9 .
Figure 9. Counterexample of the second attempt.The figures (a-d) are four different cases of surrounding neighbors.

Figure 9 .
Figure 9. Counterexample of the second attempt.The figures (a-d) are four different cases of surrounding neighbors.

Table 1 .
Part of the surface water quality standard.

Table 1 .
Part of the surface water quality standard.

Table 2 .
Results of experiments on artificial dataset

Table 6 .
Distribution of water quality data.

Table 7 .
Part of the comparison of evaluation results.

Table 8 .
Evaluation index of these models.