Next Article in Journal
Developing Innovative Crutch Using IDeS (Industrial Design Structure) Methodology
Previous Article in Journal
Evaluation of Loading Bay Restrictions for the Installation of Offshore Wind Farms Using a Combination of Mixed-Integer Linear Programming and Model Predictive Control
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhanced Label Noise Filtering with Multiple Voting

1
College of Computer Science & Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
2
Department of Software, Sejong University, 209, Neungdong-ro, Gwangjin-gu, Seoul 05006, Korea
3
Department of Computer Science and Engineering, Oakland University, Rochester, MI 48309, USA
4
College of Technological Innovation, Zayed University, Dubai 144534, UAE
5
Institute of Information Systems, Innopolis University, Tatarstan 420500, Russia
6
Department of Computer Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, Yongin-si, Gyeonggi-do 17104, Korea
*
Author to whom correspondence should be addressed.
Joint first authors.
Appl. Sci. 2019, 9(23), 5031; https://doi.org/10.3390/app9235031
Submission received: 17 September 2019 / Revised: 13 November 2019 / Accepted: 15 November 2019 / Published: 21 November 2019
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Label noises exist in many applications, and their presence can degrade learning performance. Researchers usually use filters to identify and eliminate them prior to training. The ensemble learning based filter (EnFilter) is the most widely used filter. According to the voting mechanism, EnFilter is mainly divided into two types: single-voting based (SVFilter) and multiple-voting based (MVFilter). In general, MVFilter is more often preferred because multiple-voting could address the intrinsic limitations of single-voting. However, the most important unsolved issue in MVFilter is how to determine the optimal decision point (ODP). Conceptually, the decision point is a threshold value, which determines the noise detection performance. To maximize the performance of MVFilter, we propose a novel approach to compute the optimal decision point. Our approach is data driven and cost sensitive, which determines the ODP based on the given noisy training dataset and noise misrecognition cost matrix. The core idea of our approach is to estimate the mislabeled data probability distributions, based on which the expected cost of each possible decision point could be inferred. Experimental results on a set of benchmark datasets illustrate the utility of our proposed approach.

1. Introduction

Real-world training data often include noises (or errors), which can be mainly categorized into two types: label error and feature error [1,2,3,4,5]. Label error arises when the class label of data is incorrect, while the feature error arises when the features of data are corrupted. These noises are made for various reasons. For example, sensor involved applications (such as WSN and IoT) may make noises due to the intrinsic instability of sensors [6,7]. In addition, big data further contribute to the emergence of noise [8]. When training data are noisy, the performance of learning based on it will be degraded. These two types of error have been individually studied by many works. We focus on the label error to study in this work.
The label errors are mainly caused by the subjective nature of the labeling task and lack of information for making the true label. Domain experts usually provide labeling that mainly depends on their heuristics and domain knowledge. It is a crucial fact that mislabeling cannot be even avoided with a thorough inspection of domain experts. It happens commonly when a consensus is not made during the annotation process by multiple domain experts. Mislabeling is very common in domains requiring rapid development, such as bioinformatics. For example, in a study on breast tumor [9], there existed nine subjective mislabelings among forty-nine features in the training data. Furthermore, mislabeling is also caused by insufficient information available to the expert [10,11]. An example of such information includes the unavailability of data of certain observation results of tests. Physicians are not confident to conclude the crisp diagnosis decision in the presence of partial information.
The existence of mislabeled data usually degrades the performance of learning [12,13,14,15,16,17]. In general, the goal of a learning algorithm is to search for the best hypothesis from its hypothesis space. In supervised learning, the best hypothesis is usually decided by the correlations between the features and the labels of training data. Therefore, searching for the best hypothesis will be influenced by the mislabeled data, which results in selecting a non-optimal hypothesis. The non-optimal hypothesis can bring a set of negative effects, including classification accuracy reduction, classifier construction time and complexity increase, and others.
The approaches dealing with mislabeled data are categorized into two main groups: robust algorithm design [18,19,20,21,22] and noise filter [23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41]. The first one is mainly good at developing a novel algorithm that deals with noisy data during model training. The second approach is good at the identification and filtering of mislabeled data before the training. Evidence exists that it is usually difficult to develop a robust algorithm that is insensitive to noisy data. Furthermore, it is revealed that mislabeled noisy data have a severe impact on the approach; even if the design is claimed to be robust. In comparison, filter based approaches have significant performance leverage over the robust algorithm. The core contribution of this work is in the area of filter based approaches.
Several filter based approaches are used to deal with mislabeled data, where the ensemble learning based filter (EnFilter) is a widely used approach based on its promising performance [23,24,30,33]. EnFilter leverages others with a unique approach by employing multiple classifiers to identify noises based on their voting.
According to the adopted voting mechanism, EnFilter consists of two types: single-voting based (SVFilter) and multiple-voting based (MVFilter).The SVFilter detects noises only based on one-time voting of multiple classifiers, and therefore, it has a potential instability problem.
To solve this instability problem, an MVFilter was proposed in our previous work [40]. In essence, an MVFilter consists of a set of SVFilters (assume this number is t). For the training data, if at least m ( m t ) SVFilters treat it as noisy, then the MVFilter regards it as noisy. The internal mechanism of the SVFilter makes comparisons between each SVFilter, therefore through their fusion, the MVFilter can improve the noise detection stability and accuracy compared to the SVFilter. In the design of the MVFilter, one of the key issues is how to define the value of m (called the decision point), which actually defines the noise detection rule. An optimal decision point (ODP) could maximize the performance of an MVFilter. In [40], the decision points were empirically explored with different representative values. However, a systematic approach to determining the ODP is lacking.
To this end, a novel approach is proposed in this work to compute the ODP for an MVFilter. Instead of only considering the number of errors, our approach takes cost information into account because many applications have unequal costs for various errors. When a cost matrix containing various cost values is given, the ODP selected by our approach is able to identify the noises that minimize the expected cost.
The core idea of our approach is as follows: firstly, estimating the mislabeled data distribution in the noisy training dataset; secondly, estimating the expected costs of each possible decision point; and finally, the optimal decision point determined by minimizing the expected cost.
We tested our approach based on a set of MVFilters. The experimental results show that our approach can significantly improve the performance of existing MVFilters. Our approach consistently works well for different datasets and different cost matrices. In addition, our approach is effective and straightforward. Only a few predefined parameters and prior knowledge are required.
In the next section, we will briefly review ensemble learning based noise filters. Section 3 analyzes the performance of the MVFilter when costs are considered. Our novel approach is presented in Section 4. The experimental evaluations are presented in Section 5. Section 6 concludes this work and presents future work.

2. Related Work

This work presents an approach to improve multiple-voting based ensemble filters for mislabeled data recognition. As necessary background knowledge, conventional ensemble learning based filters (EnFilter) will be introduced firstly. Then, multiple-voting based filter (MVFilter) will be presented.
EnFilter employs an ensemble classifier to detect mislabeled instances by constructing a set of base level classifiers and then using their classifications to identify mislabeled instances. The general approach is to tag an instance as mislabeled if x of the m base level classifiers cannot classify it correctly. The majority filter (MF) and consensus filter (CF) are the representative EnFilter algorithms [27,28]. MF tags an instance as mislabeled if more than half of the m base level classifiers classify it incorrectly. CF requires that all base level classifiers must fail to classify an instance as the class given by its training label for it to be eliminated from the training data.
The reason for employing ensemble classifiers in EnFilter is that the ensemble classifier has better performance than each base level classifier on a dataset if two conditions hold: (1) the probability of a correct classification by each individual classifier is greater than 0.,5 and (2) the errors in predictions of the base level classifiers are independent.
Algorithm 1 enlists the majority filter (MF) algorithm as a representative EnFilter algorithm. In Step 1, it initializes with the n disjoint subset of the training set E. In Step 2; it initializes the empty set A to reflect the noisy examples. The main loop in Steps 3–6 processes each subset E i , in an iterative manner. Step 4 establishes subset E t having all examples from E except the one existing in E i . These examples from E t are used in an arbitrary inductive learning algorithm in Step 6 to induce a hypothesis (a classifier) H j . In Step 14, all those examples from E i are added to A as potential noisy examples for which the majority of the hypothesis does not hold. CF is more conservative than MF because of the severer condition for noise identification, which ultimately results in fewer instances being eliminated from the training set. With a such property, the CF differs from MF in Step 14, and thus, it considers examples in E i as noisy only if all of them are classified incorrectly by the hypothesis. Furthermore, CF has the risk of retaining bad data.
As Algorithm 1 shows, the core of EnFilter is adopting a voting mechanism to recognize noises. Training data x, in subset E i after data partitioning on E, will be voted on by the multiple classifiers, which are trained based on the data in E\E i . Suppose y(x) is the function to determine whether x is mislabeled, then y(x) = vote(classifiers(E\E i ) , x). Like MF and CF, the conventional EnFilter decides y(x) only based on one-time voting and therefore, is a single-voting based filter (SVFilter).
As pointed out in [40], the SVFilter suffers from an instability problem. For data x, if the SVFilter runs twice, the first data random partitioning might assign x to subset E i , while the second time, it assigns x to subset E k . Therefore, we have y(x) = vote(classifiers(E\E i ) ) and y(x) = vote(classifiers(E\E k ) ) . Note that as there is diversity between E\E i and E\E k , the voting results of two SVFilters might be different. Therefore, instead of one-time voting, multiple-voting based filters (MVFilter) have been proposed to address this instability problem.
MVFilter consists of t SVFilters. Each SVFilter generates its own decision about suspected mislabeled data index A i . Finally, all the different decisions A i ( i = 1 : t ) will be combined by the MVFilter to output the final decision about which data are mislabeled. Therefore, the decision function of MVFilter can be described as y(x) = vote 2 (vote 1 (E\E 1 ) , vote 1 (E\E 2 ) , …, vote 1 (E\E t ) ) . In this function, vote 1 is the voting policy used by each SVFilter; vote 2 is the voting policy used by MVFilter; E i is the subset containing x obtained from the ith SVFilter. Usually, the vote 1 policy can either be based on majority voting or consensus voting. For the vote 2 policy, we have developed three policies: majority voting, consensus voting, and one-time veto. One-time vote means that if at least one SVFilter tags data as mislabeled, then the MVFilter will tag these data as mislabeled. In the MVFilter, different vote 1 and vote 2 policies can be combined to make various algorithms. As the example of MVFilter, the MF M F [40] algorithm is presented in Algorithm 2, which utilizes majority voting for both vote 1 and vote 2 .
Algorithm 1 Majority filtering algorithm.
Algorithm: majority filtering (MF)
Input: E (training set)
Parameter: n (number of subsets), y (number of learning algorithms), A 1 , A 2 , …, A y (y kinds of learning algorithms)
Output: A (detected noisy subset of E)
(1) form n disjoint almost equally sized subset of E i , where i E i = E
(2) A
(3) for i=1, …, n do
(4)  form E t E \ E i
(5)  for j = 1,…y do
(6)   induce H j based on examples in E t and A j
(7)  end for
(8)  for every e E i do
(9)    E r r o r C o u n t e r 0
(10)   for j = 1,…,y do
(11)    if Hj incorrectly classifies e
(12)    then E r r o r C o u n t e r E r r o r C o u n t e r + 1
(13)   end for
(14)   if E r r o r C o u n t e r > y 2 , then A A { e }
(15)  end for
(16) end for
Algorithm 2 MFMF algorithm.
MajorityFiltering_MajorityFiltering (MFMF)
Input: E (training set)
Parameter: n (number of subsets), y (number of learning algorithms), t (number of times of subsets partitioning), A1, A2, …, Ay(y kinds of learning algorithms)
Output: A (detected noisy subset of E)
(1) for p = 1,…, t do
(2)  form n disjoint almost equally sized subset of E p i , where i E p i = E
(3)   A p
(4)  for i = 1, …, n do
(5)   form E t E \ E p i
(6)   for j = 1,…y do
(7)   induce H p j based on examples in E t and A j
(8)   end for
(9)   for every e E p i do
(10)    E r r o r C o u n t e r 0
(11)   for j = 1,…,y do
(12)    if H p j incorrectly classifies e
(13)    then E r r o r C o u n t e r E r r o r C o u n t e r + 1
(14)   end for
(15)   if E r r o r C o u n t e r > y 2 , then A p A p { e }
(16)  end for
(17)  end for
(18) end for
(19) A
(20) for every e E do
(21)   E r r o r C o u n t e r 0
(22)  for j = 1,…, p do
(23)   if e A p
(24)   then E r r o r C o u n t e r E r r o r C o u n t e r + 1
(25)  end for
(26)  if E r r o r C o u n t e r > p 2 , then A A { e }
(27) end for

3. Analysis of Decision Point, Error Probability, and Cost for MVFilter

The multiple-voting based filter (MVFilter) consists of several single-voting based filters (SVFilter). The MVFilter treats data as mislabeled if at least m out of t SVFilters identify these data as mislabeled. Obviously, for different m values, the recognized noises by an MVFilter will be different. The selection of the m value plays an important role in an MVFilter. Because the m value decides the noise identifying results, it is called the “decision point” in this work. Our goal is to find a way to decide the “optimal decision point” to maximize the performance of MVFilter.
When a filter works on a noisy training dataset, it is usually hard to recognize all the noises perfectly. The errors made by a filter include two types: The first type (E1) occurs when declaring a correctly labeled example as mislabeled and is subsequently discarded. The second type of error (E2) corresponds to declaring a mislabeled example as correctly labeled. For a well designed filter, it is desirable to avoid both E1 and E2 errors. However, conceptually, E1 and E2 are conflicting. To reduce E1 errors, the filter should make a more severe noise detection policy, which tends to increase E2 errors. In MVFilter, the selection of the decision point will influence the probability to make an E1 or E2 error.

3.1. Relationship between the Decision Point and Error Probability in MVFilter

An MVFilter fuses the noise detection results of multiple SVFilters, while an SVFilter fuses the classification results of multiple classifiers. Therefore, the errors made by each classifier are the basis to infer the errors made by an MVFilter.
Let P(E1 i ) and P(E2 i ) be the probability that classifier i makes an E1 and E2 error, respectively. To clarify the analysis, it is assumed that all the various classifiers in an SVFilter have the same probability of making an error. Therefore, we assume that P ( E 1 i ) = P ( E 1 ) and P ( E 2 i ) = P ( E 2 ) . The most commonly used SVFilters include the majority filter (MF) and consensus filter (CF). The analysis here is based on MF, while a similar analysis can be conducted for CF.
MF makes an E1 (or E2) error when more than half of these classifiers make an E1 (or E2) error. If the number of classifiers in MF is y, then we have:
P ( E 1 M F ) = j > y / 2 j = y ( P ( E 1 ) j ( 1 P ( E 1 ) ) y j ) y j
P ( E 2 M F ) = j > y / 2 j = y ( P ( E 2 ) j ( 1 P ( E 2 ) ) y j ) y j
Suppose an MVFilter consists of t majority filters (MMF). Let P ( E 1 M F i ) and P ( E 2 M F i ) denote the probability that each MF makes an E1 and E2 error, respectively. To simplify the analysis, it is assumed that P ( E 1 M F i ) = P ( E 1 M F ) and P ( E 2 M F i ) = P ( E 2 M F ) . The decision rule of an MVFilter is “if at least m of the t SVFilters think data is mislabeled, then these data are identified as mislabeled”. This m value, called the decision point, will influence the probabilities of making an error for an MVFilter. Let MMF represent an MVFilter consisting of multiple majority filters, then the following relationships can be found:
P ( E 1 M M F ) =
j m j = t ( P ( E 1 M F ) j ( 1 P ( E 1 M F ) ) t j ) t j
P ( E 2 M M F ) =
j > t m j = t ( P ( E 2 M F ) j ( 1 P ( E 2 M F ) ) t j ) t j
The decision point value m can be any number between one and t. Among all possible values, the representative decision points include m = 1, m = t/2, m = t. When m = 1, data will be identified as mislabeled if at least one SVFilter thinks these data are mislabeled. When m = t, data will be identified as mislabeled only if all the t SVFilters think these data are mislabeled. Conceptually, the noise detection rule is too loose if the decision point is one, while the rule is too strict if the decision point is t. In this sense, m = t/2 is usually moderate. For these three representative decision points, we have the following relationships:
P ( E 1 M M F | m = 1 ) = 1 ( 1 P ( E 1 M F ) ) t
P ( E 2 M M F | m = 1 ) = P ( E 2 M F ) t
P ( E 1 M M F | m = t / 2 ) =
j t / 2 j = t ( P ( E 1 M F ) j ( 1 P ( E 1 M F ) ) t j ) t j
P ( E 2 M M F | m = t / 2 ) =
j > t / 2 j = t ( P ( E 2 M F ) j ( 1 P ( E 2 M F ) ) t j ) t j
P ( E 1 M M F | m = t ) = P ( E 1 M F ) t
P ( E 2 M M F | m = t ) = 1 ( 1 P ( E 2 M F ) ) t
For the above relationships, normally we have:
P ( E 1 M M F | m = t ) < P ( E 1 M M F | m = t / 2 ) <
P ( E 1 M M F | m = 1 )
P ( E 2 M M F | m = 1 ) < P ( E 2 M M F | m = t / 2 ) <
P ( E 2 M M F | m = t )
As P ( E 1 M M F ) and P ( E 2 M M F ) are conflicting, the optimal decision point should make a trade-off between these two probabilities. Therefore, if the probability of making errors is the only concern of MVFilter, the optimal decision point (ODP) is O D P = arg min m = 1 : j ( P ( E 1 M M F ) + P ( E 2 M M F ) ) .

3.2. Relationship between the Decision Point and Error Cost

In Section 3.1, for an MVFilter, the relationships between the optimal decision point and probabilities of making errors are analyzed. In this section, the costs of misrecognitions are considered. We will further analyze the relationships between the decision point and expected costs.
Misrecognition/error costs allow us to specify the relative importance of different kinds of errors. In fact, many applications have unequal misrecognition costs. In our previous work [41] while studying the behaviors of the supervised feature selection algorithm, we noticed a trade-off of a smaller and bigger number of noise-free data preferences among various algorithms. As a consequence of this trade-off, different costs should be determined for different errors. A smaller number of noise-free data yields a higher type 1 error cost compared to the type 2 error cost.
The various misrecognition costs are defined by a cost matrix. The cost matrix reflects the domain specific costs representing the cost sensitive model in the critical medical domain. Therefore, associative costs for a different type of error are finalized by the domain expert keeping the clinical context and consequences in mind.
As shown in Table 1, cost matrix C usually has the following structure, wherein the cost matrix rows correspond to predicted results, while columns correspond to actual results, i.e., row/column = predict/actual.
For correctly classified mislabeled (or noise-free) data, the cost is zero, and hence, it is normally assumed that C 00 = C 11 = 0 in the above matrix. With this assumption, the expected cost of an MVFilter is:
E x p e c t e d C o s t M V F i l t e r =
P ( E 1 M V F i l t e r ) C 01 + P ( E 2 M V F i l t e r ) C 10
As Section 3.1 shows, P ( E 1 M V F i l t e r ) and P ( E 2 M V F i l t e r ) are correlated with the decision point value. Therefore, E x p e c t e d C o s t M V F i l t e r is determined by both the decision point value and the cost matrix. If the cost matrix is fixed, then E x p e c t e d C o s t M V F i l t e r is only influenced by the decision point value. Therefore, the cost concerned optimal decision point should be:
O D P = arg min m = 1 : j ( P ( E 1 M V F i l t e r ) C 01 + P ( E 2 M V F i l t e r ) C 10 )
In this equation, if C 01 C 10 , P ( E 1 M V F i l t e r ) will be the dominant factor to determine ODP value. The ODP is the decision point that can minimize P ( E 1 M V F i l t e r ) . From the analysis in Section 3.1, we know that it is highly probable that O D P = t . On the other hand, if C 10 C 01 , P ( E 2 M V F i l t e r ) will be the dominant factor to determine the OPD value. In this case, it is likely that O D P = 1 .
It should be noted that the ODP can be determined from the above analysis only in some extreme cases (for example, when C 01 C 10 or C 10 C 01 ) . However, for the other cases, directly calculating the ODP is extremely difficult. In addition, the above equation of the ODP is obtained by making several assumptions. Therefore, it is not very useful to calculate the ODP value through mathematically inferring since the calculated ODP is influenced by the assumptions.

4. Novel Approach to Determine the Optimal Decision Point

In this section, we present our approach that can select the optimal decision point for an MVFilter by considering both cost information and the dataset itself.
Given a noisy training dataset, we define that the ODP is the value that can minimize the expected cost of an MVFilter. As pointed out in Section 3, mathematically inferring the ODP is difficult. Therefore, instead of directly inferring, we try to estimate the ODP implicitly.
For a noisy training dataset E, if we already know which data in E are mislabeled, it is trivial to decide the ODP. We just need to explore all the possible decision points. The OPD will be the point that minimizes the overall costs of misrecognitions.
Of course, the mislabeled data distribution in E is unknown since our mission is to identify mislabeled data from E. However, if there exists another noisy dataset E’ similar to E and with a known mislabeled data distribution, then we could implicitly estimate ODP from E’ instead of E since their ODPs should be similar.
This actually is the key idea of our approach. Given a noisy dataset E to handle, we will generate another dataset E’. The new generated E’ requires: (1) E’ and E are from the same/similar data distribution, and (2) the mislabeled data distributions in E and E’ are similar. If such an E’ could be generated, we can easily get the ODP based on E’ since the mislabeled data distribution in E’ is known.
In many real applications, in addition to the noisy dataset E, usually another validation dataset E n f is available. E n f contains only noise-free data and coming from the same data distribution as E. As there are no mislabeled data in E n f , the artificial erroneous labels are put into E n f . Here, we assume that the prior knowledge of the noise ratio in E is available, which is used to determine the erroneous labels put into E n f . Through the above procedures, E n f can be converted to E’. The optimal decision point from E’ can be used to estimate the actual decision point in E.
As the actual mislabeled data distribution in E is not available, we put erroneous labels in E n f in a random manner based on the prior noise ratio information. Although the mislabeled data in E are also stochastic, the mislabeled data distribution in E and E’ can have a great difference. In this case, the ODP value obtained from E’ is actually not optimal for E. To solve this problem, the ODP is estimated several times. This method uses the numIter parameter to control the specified number of iterations. Each time E’ changes since random erroneous labels are put into E n f . For each time, all the possible decision points (from one to t) will be explored, and accordingly, the cost of misrecognition is recorded. The average cost of each decision point value is obtained by taking the mean value of this decision point multiple times the misrecognition costs. Finally, the decision point having the least average cost is selected as the optimal decision point. The details of our algorithm are shown in Algorithm 3.
Algorithm 3 Optimal decision point estimation for MVFilter.
Algorithm: Searching optimal decision point for MVFilter
Input: E (training set), E n f (noise-free dataset)
Parameter: numIter (number of iterations to search ODP), noiseRatio (the noise ratio in E), MVFilter (the multiple-voting based filter algorithm), t (number of single-voting filters in MVFilter), C (cost matrix)
Output: ODP (optimal decision point)
(1) c o s t M a t r i x
(2) for i = 1,…, numIter do
(3)  randIndex←RandomPermutation( E n f )
(4)  noiseIndex←randIndex(1: E n f × n o i s e R a t i o )
(5)   E g e n e r a t e N o i s e ( E n f , n o i s e I n d e x )
(6)   c o s t V e c t o r
(7)  for m = 1,…, t do
(8)   noiseIndexDetected←MVFilter(E’, m)
(9)   index←InterSection(noiseIndex, noiseIndexDetected)
(10)   indexE1←noiseIndexDetected\index
(11)   indexE2←noiseIndex\index
(12)    c o s t i n d e x E 1 × C 01 + i n d e x E 2 × C 10
(13)   costVector(m) ←cost
(14)  end for
(15)  costMatrix = [costrMatrix; costVector];
(16)  end for
(17) O D P arg min m = 1 : t ( costMatrix ( 1 : e n d , m ) )
In Algorithm 3, it is assumed that another noise-free dataset E n f exists, which has the same distribution as E. Usually in a training dataset, some labels are certainly correct. These partial noise-free data are also used as a validation dataset in many applications. However, for a few applications, if E n f is unavailable, then this algorithm cannot be used directly. To solve this problem, we can directly use an MVFilter to mine the noise-free data from E. In this case, the loose noise detection policy is preferred by MVFilter. To generate E n f , the main concern is to make less E2 errors. Therefore, a small decision point value (for example, one) should be used by MVFilter. By this method, E n f can be collected from E. Then, Algorithm 3 can be used. The parameter noiseRatio in Algorithm 3 should also be noted. This parameter represents the noise ratio in E (mislabeled percentage of E). It is used to decide the number of erroneous labels to generate in E’. Here, we assume this is prior knowledge. For many applications, through years of experience, the rough noise ratio in a noisy training set is usually known. If this value is totally unknown, we also provide a solution. This parameter can be estimated from E by using an MVFilter. To estimate this parameter more accurately, MVFilter should select a decision point that considers the E1 and E2 error simultaneously. The value t/2 is a reasonable decision point since this decision point usually has a good trade-off between E1 and E2 errors.

5. Experimental Work

In this section, a set of experiments is conducted to verify the effectiveness of our proposed approach. To test its performance, several representative single-voting and multiple-voting based filters are used. SVFilters include the majority filter (MF) and consensus filter (CF) [27,28]. MF based MVFilters include MF 1 , MF M F , and MF C F [40]. CF based MVFilters include CF 1 , CF M F , and CF C F [40]. Suppose the number of SVFilters in an MVFilter is t. In MF 1 and CF 1 , the decision point is 1. In MF M F and CF M F , the decision point is t/2, while in MF C F and CF C F , the decision point is t. When the decision point is determined by our approach, the MF based MVFilter is denoted by MF O D P and the CF based MVFilter is denoted by CF O D P . When filtering noises, the costs incurred by MF O D P and CF O D P will be compared to other methods. If our approach is effective, MF O D P and CF O D P should incur less cost compared to other methods.
Six bioinformatics datasets from the UCI repository were used in this experiment. Information on these datasets is tabulated in Table 2, where pos/neg presents the percentage of the number of positive examples against that of negative examples.
An SVFilter (referring to Algorithm 1) is configured as follows: the number of subsets is 3 (n = 3); three learning algorithms are used (y = 3) including naive Bayes, decision tree, and 3-nearest neighbor. The configurations of an MVFilter (referring to Algorithm 2) are basically identical to the SVFilter configurations. One additional parameter in MVFilter is the number of SVFilters, which equals nine in the experiments (t = 9). Our proposed algorithm (referring to Algorithm 3) is based on MVFilter. Its additional parameter is the number of iterations to search for ODP. Here, it equals ten (numIter = 10).
The experiments were performed on each benchmark dataset by dividing it into a training set and test set. The filter algorithms were applied to each training set to remove the mislabeled data. Test data were only used by our algorithm, which is represented as E n f in Algorithm 3. It is important to clarify here that domain experts were involved to establish the noise-free benchmark dataset, which included the desired labels finalized after coming to a common consensus.
Making the cost value as a baseline computation, the performance of each filter algorithm was evaluated against each dataset D using the following steps:
  • Evaluating the performance of each filter using three trials derived from the threefold cross-validation of D. For each trail, 2/3 of D or Tr were used for the training set. We purposely changed some correct labels in the Tr using the predefined mislabeled ratio to generate the mislabeled data. For this purpose, three different mislabeled ratios were used: 10%, 20%, and 30%. As an example, for a 10% mislabeled ratio, 10% of the samples from Tr were randomly selected and then the correct labels changed.
  • The average cost of each algorithm was calculated by taking the mean cost of errors for each filter of the three trails.
  • In order to avoid the influence of the partitioning of D on the generated mislabeled data, we considered ten cost values retrieved from each experiment conducted ten times (i.e., repeating the previous two steps ten times).
  • Finally, the reported cost value was obtained as the mean of these ten values.

5.1. Experimental Investigation

Next, the experimental results of each dataset will be presented. Table 3 shows the comparisons of each filter in terms of cost on the Heart dataset. This table consists of three parts corresponding to three noise ratios (10%, 20%, and 30%). Under each noise ratio, the experiments were based on nine different cost matrices. Here, it was assumed that C 00 = C 11 = 0 , so only C 01 and C 10 were needed to define a cost matrix. For example, in the second row of Table 3, 1:1 means C 01 = C 10 = 1 , while 1:20 means C 01 = 1 , C 10 = 20 . The last column in Table 3, Ave., represents the average cost of each filter based on all nine cost matrices.
Table 3 shows that for all three noise ratios, CF O D P had the lowest average cost among all the CF based filters. Likewise, MF O D P was the best one among all the MF based filters. Moreover, under all the noise ratios and cost matrices, CF O D P and MF O D P outperformed other filters in most cases. This was in contrast to the other filters that heavily depend on cost matrices. For example, CF C F showed outstanding performance when C 01 > C 10 , but its performance decreased dramatically when C 10 increased. When the correlation between the cost and noise ratio was considered, we found that the cost of all the filters increased with noise ratio growth. However, compared with other filters, the cost increases of CF O D P and MF O D P were slow. In detail, when the noise ratio grew from 10% to 30%, the cost increase of CF O D P was 44, MF O D P was 12, while the cost increase of other filters was fast (for example, 97 for CF 1 and 102 for MF 1 ). By further comparing CF O D P and MF O D P , we found that under this dataset, CF O D P had a smaller average cost value. However, with the noise ratio increasing, the performance difference between them became small.
Table 4 shows the cost comparisons of each filter based on the Wdbcdataset. The experimental conclusions in Table 4 are similar to those of Table 3. In most cases (under different noise ratios and cost matrices), CF O D P and MF O D P were the winners. In addition, their advantages were more obvious when the noise ratio and cost value increased. When the noise ratio was 10%, CF O D P outperformed MF O D P . However, they showed similar performance when the noise ratio grew.
Table 5 presents the experimental results based on the Wpbc dataset. Similar to the experimental conclusions from Table 3 and Table 4, our approach could effectively improve the performance of the MF and CF based filters. In addition, CF O D P and MF O D P consistently worked well in different cases. Except for CF O D P and MF O D P , the performance of the other filters usually had a dramatic decline when the noise ratio increased. Moreover, other filters had obvious performance changes when the relationship of C 01 and C 10 changed. For example, MF M F worked well when C 10 > C 01 , but its performance became poor when C 01 > C 10 .
Table 6, Table 7 and Table 8 show the experimental results on the datasets of Spect, Spect1, and Promoter. Similar to the above analysis, these three tables clearly indicate the superiority of CF O D P and MF O D P .
Several important conclusions can be drawn by summarizing the above evaluation results:
(1) Selecting the optimal decision point by our approach could effectively improve the performance of an MVFilter. (2) CF O D P and MF O D P adapted to various noise ratios. In particular, even in a high noise ratio environment, the cost increases of CF O D P and MF O D P were not great. (3) Under different cost matrices, CF O D P and MF O D P consistently outperformed other filters. The advantages of CF O D P and MF O D P were more obvious when the difference between C 01 and C 10 was big. (4) Given a noisy training dataset, our proposed approach proved to be effective under different noise ratios and cost matrices if two conditions hold: (a) the noise ratio of this dataset is known; (b) there exists another noise-free training dataset that is drawn from the same distribution as this noisy dataset.

5.2. Extended Experimental Investigation

As pointed out above, our approach was verified to work well if the noise ratio and additional noise-free dataset were available. To further confirm the usability of our approach, we evaluated it in an environment where the two kinds of information were not available, that is the noisy training dataset E was the only available information.
The noise ratio was estimated by the CF M F algorithm. As an MVFilter, CF M F consists of t consensus filters. The decision point here equals t/2. In other words, if at least t/2 CFs identify data as mislabeled, then CF M F will regard that these data are mislabeled. For a noisy training dataset E, if n data are identified by CF M F , then the estimated noise ratio is n / E . The parameter configurations of CF M F were consistent with before (referring to the beginning of Section 4).
The noise-free dataset was obtained by applying CF 1 algorithm on E. CF 1 consists of t consensus filters. If at least one of CF identify data as mislabeled, then CF 1 will regard these data as mislabeled. Conceptually, the noise detection is loose, which aims to remove all the potential mislabeled data. Suppose the noises recognized by CF 1 are A. Then, the noise-free dataset is the subset of E, which excludes A. The configurations of CF 1 were in accordance with above experiments.
Table 9, Table 10, Table 11, Table 12, Table 13 and Table 14 show the experimental results on the benchmark datasets. Under all five datasets and all the noise ratios, CF O D P and MF O D P still showed outstanding performance, which defeated other filters in most cases. When compared to the experiment results in Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8, we found that the performances of CF O D P and MF O D P had a certain extent of degradation in a few cases. However, in general, the performance change was moderate. This indicates that even without a noise ratio and an additional noise-free dataset, our proposed approach still worked well. One of the reasons was that in our approach, the mislabeled data distribution was estimated multiple times. Although the estimated mislabeled distribution for each time might be distorted, their fusion approached the real distribution. Then, the estimated ODP was also close to the real optimal decision point.
The two independent experimental evaluations in Section 5.1 and Section 5.2 proved that our proposed approach was effective and able to improve the performance of any MVFilter by selecting the optimal decision point. In particular, in the high noise ratio and high cost values, our approach showed significant improvements compared to other filters.

6. Conclusions and Future Works

In mislabeled data detection, the multiple-voting based filter (MVFilter) is generally superior to the conventional single-voting based filter (SVFilter). However, one important unsolved issue in the MVFilter is how to choose the optimal decision point (ODP) to maximize its noise detection performance.
In this paper, a novel approach was proposed to solve this issue. This approach implicitly computed the ODP by estimating the mislabeled data distribution in the noisy training dataset. Our approach took a noisy dataset and a cost matrix as input, then output an ODP, which aimed to minimize the expected cost of errors. Note that minimizing cost was one important contribution of this work, because most existing works were not aware of the importance of cost. They just implicitly assumed that all errors were equally costly, but in most real applications, this is far from the case.
A set of experimental evaluations was conducted, which proved the effectiveness of our approach. With the aid of our approach, an MVFilter could effectively reduced the cost. In particular, in the difficult noise detection environment (when the noise ratio was high or cost was big), the advantages of our approach were more obvious. Furthermore, the proposed methodology could also be extended to a multi-class problem. One possible strategy is the naive way to divide the multi-class problem into several two-class problems, and then, the proposed approach can solve each two-class problem.
Although the clean dataset (i.e., validation cases) and cost matrix are available in most cases, the prior information of noise ratio is not easily available; therefore, the current solution needs to be improved to alleviate the prior information requirement. Therefore, in future work, we will focus on developing more elegant approaches to further improve the current proposed approach.

7. Availability of Data and Material

All the datasets are available at http://archive.ics.uci.edu/ml/datasets.html.

Author Contributions

Conceptualization, D.G. and W.Y.; methodology, D.G. and W.Y.; software, D.G. and W.Y.; validation, D.G., M.H., W.Y. and A.M.K.; formal analysis, D.G., M.H. and W.Y.; investigation, D.G., W.Y. and M.F.; resources, D.G., W.Y. and A.M.K.; data curation, D.G., W.Y. and M.F.; writing–original draft preparation, D.G., M.H. and W.Y.; writing–review and editing, M.H. and W.A.K.; visualization, D.G. and W.Y.; supervision, D.G. and W.Y.; project administration, D.G. and W.Y.; funding acquisition, D.G., A.M.K. and W.A.K.

Funding

This research was supported by the Natural Science Foundation of China (Grant No. 61672284), the Natural Science Foundation of Jiangsu Province (Grant No. BK20171418), the China Postdoctoral Science Foundation (Grant No. 2016M591841), and the Jiangsu Planned Projects for Postdoctoral Research Funds (No. 1601225C). This research was also supported by the Defense Industrial Technology Development Program under Grant No. JCKY2016605B006. Furthermore, this research work was supported by the Zayed University Research Cluster Award # R18038. This research was also supported by the National Research Foundation (NRF) of Korea (NRF-2019R1G1A1011296).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Guan, D.; Yuan, W.; Lee, Y.K. Nearest neighbor editing aided by unlabeled data. Inf. Sci. 2009, 179, 2273–2282. [Google Scholar] [CrossRef]
  2. Van, J.; Khoshgoftaar, T.; Huang, H. The pairwise attribute noise detection algorithm. Knowl. Inf. Syst. 2007, 11, 171–190. [Google Scholar]
  3. Van, J.; Khoshgoftaar, T. Knowledge discovery from imbalanced and noisy data. Data Knowl. Eng. 2009, 68, 1513–1542. [Google Scholar]
  4. Zhu, X.Q.; Wu, X.D. Class noise vs. attribute noise: A quantitative study. Artif. Intell. Rev. 2004, 22, 177–210. [Google Scholar] [CrossRef]
  5. Zhu, X.Q.; Wu, X.D.; Yang, Y. Dynamic classifier selection for effective mining from noisy data streams. In Proceedings of the Fourth IEEE International Conference on Data Mining, Brighton, UK, 1–4 November 2004; pp. 305–312. [Google Scholar]
  6. Han, G.; Jiang, J.; Guizani, M.; Rodrigues, J.J.P.C. Green routing protocols for wireless multimedia sensor networks. IEEE Wirel. Commun. 2016, 23, 140–146. [Google Scholar] [CrossRef]
  7. Han, G.; Que, W.; Jia, G.; Zhang, W. Resource Utilization-aware Energy Efficient Server Consolidation Algorithm for Green Computing in IIOT. J. Netw. Comput. Appl. 2017. [Google Scholar] [CrossRef]
  8. Jia, G.; Han, G.; Jiang, J.; Liu, L. Dynamic Adaptive Replacement Policy in Shared Last-Level Cache of DRAM/PCM Hybrid Memory for Big Data Storage. IEEE Trans. Ind. Inform. 2016. [Google Scholar] [CrossRef]
  9. West, M.; Blanchette, C.; Dressman, H.; Huang, E.; Ishida, S.; Spang, R.; Zuzan, H.; Olson, J.A., Jr.; Marks, J.R.; Nevins, J.R. Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA 2001, 98, 11462–11467. [Google Scholar] [CrossRef]
  10. Hickey, R.J. Noise modelling and evaluating learning from examples. Artif. Intell. 2006, 82, 157–179. [Google Scholar] [CrossRef]
  11. Pechenizkiy, M.; Tsymbal, A.; Puuronen, S.; Pechenizkiy, O. Class noise and supervised learning in medical domains: The effect of feature extraction. In Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems, Salt Lake City, UT, USA, 22–23 June 2006; pp. 708–713. [Google Scholar]
  12. Bi, Y.; Jeske, D.R. The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise. J. Multivar. Anal. 2010, 101, 1622–1637. [Google Scholar] [CrossRef]
  13. Nettleton, D.; Orriols-Puig, A.; Fornells, A. A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 2010, 33, 275–306. [Google Scholar] [CrossRef]
  14. Zhang, J.; Yang, Y. Robustness of regularized linear classification methods in text categorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, Toronto, ON, Canada, 28 July–1 August 2003; pp. 190–197. [Google Scholar]
  15. Opitz, D.; Maclin, R. Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 1999, 11, 169–198. [Google Scholar] [CrossRef]
  16. Dietterich, T.G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 2000, 40, 139–157. [Google Scholar] [CrossRef]
  17. Ratsch, G.; Onoda, T.; Muller, K. Soft margins for AdaBoost. Mach. Learn. 2001, 42, 287–320. [Google Scholar] [CrossRef]
  18. Bootkrajang, J.; Kaban, A. Classification of mislabelled microarrays using robust sparse logistic regression. Bioinformatics 2013, 29, 870–877. [Google Scholar] [CrossRef] [PubMed]
  19. Gu, B.; Victor, S.S. A Robust Regularization Path Algorithm for v-Support Vector Classification. IEEE Trans. Neural Netw. Learn. Syst. 2016. [Google Scholar] [CrossRef]
  20. Saez, J.; Galar, M.; Luengo, J.; Herrera, F. A first study on decomposition strategies with data with class noise using decision trees. In Hybrid Artificial Intelligent Systems; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7209, pp. 25–35. [Google Scholar]
  21. Beigman, E.; Klebanov, B.B. Learning with annotation noise. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1, Singapore, 2–7 August 2009; pp. 280–287. [Google Scholar]
  22. Sastry, P.S.; Nagendra, G.D.; Manwani, N. A team of continuousaction learning automata for noise-tolerant learning of half-spaces. IEEE Trans. Syst. Man Cybern. B Cybern. 2010, 40, 19–28. [Google Scholar] [CrossRef]
  23. Manwani, N.; Sastry, P.S. Noise tolerance under risk minimization. IEEE Trans. Cybern. 2013, 43, 1146–1151. [Google Scholar] [CrossRef]
  24. Abellan, J.; Masegosa, A.R. Bagging decision trees on data sets with classification noise. In Foundations of Information and Knowledge Systems; Springer: Berlin/Heidelberg, Germany, 2010; pp. 248–265. [Google Scholar]
  25. Gu, B.; Sheng, V.S.; Tay, K.Y.; Romano, W.; Li, S. Incremental support vector learning for ordinal regression. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 1403–1416. [Google Scholar] [CrossRef]
  26. Abellan, J.; Moral, S. Building classification trees using the total uncertainty criterion. Int. J. Intell. Syst. 2003, 18, 1215–1225. [Google Scholar] [CrossRef]
  27. Brodley, C.E.; Friedl, M.A. Improving autmated land cover mapping by identifying and eliminating mislabeled observations from training data. In Proceedings of the Geoscience and Remote Sensing Symposium, Lincoln, NE, USA, 31 May 1996; pp. 1379–1381. [Google Scholar]
  28. Brodley, C.E.; Friedl, M.A. Identifying mislabeled training data. J. Artif. Intell. Res. 1999, 11, 131–167. [Google Scholar] [CrossRef]
  29. Chaudhuri, B.B. A new definition of neighborhood of a point in multi-dimensional space. Pattern Recognit. Lett. 1996, 17, 11–17. [Google Scholar] [CrossRef]
  30. Guan, D.; Yuan, W.; Lee, Y.K.; Lee, S. Identifying mislabeled training data with the aid of unlabeled data. Appl. Intell. 2011, 35, 345–358. [Google Scholar] [CrossRef]
  31. John, G.H. Robust decision trees: Removing outliers from databases. In Proceedings of the International Conference on Knowledge Discovery and Data Mining, Montréal, QC, Canada, 20–21 August 1995; pp. 174–179. [Google Scholar]
  32. Marques, A.I. Decontamination of training data for supevised pattern recognition. In Advances in Pattern Recognition; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2000; Volume 1876, pp. 621–630. [Google Scholar]
  33. Sánchez, J.S.; Barandela, R.; Marqués, A.I.; Alejo, R.; Badenas, J. Analysis of new techniques to obtain quality training sets. Pattern Recognit. Lett. 2003, 24, 1015–1022. [Google Scholar] [CrossRef]
  34. Metxas, D.; Metaxas, D.; Fradkin, D.; Kulikowski, C.; Muchnik, I. Distinguishing mislabeled data from correctly labeled data in classifier design. In Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, FL, USA, 15–17 November 2004; pp. 668–672. [Google Scholar]
  35. Verbaeten, S.; Assche, A.V. Ensemble methods for noise elimination in classification problems. In Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2003; pp. 317–325. [Google Scholar]
  36. Wilson, D.L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 1992, 2, 431–433. [Google Scholar] [CrossRef]
  37. Wu, X.; Zhu, X.; Chen, Q. Eliminating class noise in large datasets. In Proceedings of the International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003; pp. 920–927. [Google Scholar]
  38. Young, J.; Ashburner, J.; Ourselin, S. Wrapper methods to correct mislabeled training data. In Proceedings of the 2013 International Workshop on Pattern Recognition in Neuroimaging, Philadelphia, PA, USA, 22–24 June 2013; pp. 170–173. [Google Scholar]
  39. Zhou, Z.H.; Jiang, Y. Editing training data for knn classifiers with neural network ensemble. In Advances in Neural Networks; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3173, pp. 356–361. [Google Scholar]
  40. Guan, D.; Yuan, W.; Ma, T.; Lee, S. Detecting potential labeling errors for bioinformatics by multiple voting. Knowl.-Based Syst. 2014, 66, 28–35. [Google Scholar] [CrossRef]
  41. Yuan, W.; Guan, D.; Shen, L.; Pan, H. An empirical study of filter based feature selection algorithms using noisy training data. In Proceedings of the 4th IEEE International Conference on Information Science and Technology, Shenzhen, China, 26–28 April 2014; pp. 209–212. [Google Scholar]
Table 1. Cost matrix of the mislabeled data filter.
Table 1. Cost matrix of the mislabeled data filter.
Actual MislabeledActual Noise-Free
Predict mislabeled and eliminateC(0,0) = C 00 C(0,1) = C 01
Predict noise-free and retainC(1,0) = C 10 C(0,0) = C 11
Table 2. Datasets used in the experiment. pos, positive; neg, negative.
Table 2. Datasets used in the experiment. pos, positive; neg, negative.
DatasetNo. of FeaturesNo. of Instancespos/neg
Heart1427055.6%/44.4%
Wdbc3056962.7%/37.3%
Wpbc3319876.3%/23.7%
Spect2226779.4%/20.6%
Spect14426779.4%/20.6%
Promoter5710650%/50%
Table 3. Cost comparisons on Heart. Ave., average; CF, consensus filter; MF, majority filter.
Table 3. Cost comparisons on Heart. Ave., average; CF, consensus filter; MF, majority filter.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF223345741325486167328105
CF 1 343944568198161320638164
CF M F 223446771385283161315103
CF C F 17406412324127376211280
CF O D P 173544607927376211252
MF3441486510096157311619164
MF 1 59636776951742885741146282
MF M F 313743599087143282561148
MF C F 20324474134497815029598
MF O D P 2034436387497815029591
C 01 :C 10 (20% Noise Ratio)
CF2962951773425581145274140
CF 1 395367102172104169331656188
CF M F 2655831532955176137261126
CF C F 298213626953633384971138
CF O D P 2753661021723438497168
MF415875118203105170331653195
MF 1 7077841011362043386721341336
MF M F 36526710618493151293579174
MF C F 2758891663205073131247129
MF O D P 28516899139507313124798
C 01 :C 10 (30% Noise Ratio)
CF421011603086046893156283202
CF 1 507396153268128205399787240
CF M F 39971563015925980130231187
CF C F 4814223647194151535971230
CF O D P 3875961532685253597196
MF4980111189344115181347678233
MF 1 8088951131502343877701537384
MF M F 4472100169308104164315616210
MF C F 411051683286475976121210195
MF O D P 3971891161505976121210103
Table 4. Cost comparisons on Wdbc.
Table 4. Cost comparisons on Wdbc.
1:11:31:51:101:203:15:110:120:1Ave.
CF16294172134365610520477
CF 1 21263042655996190378101
CF M F 1424345910933529919469
CF C F 1640631232422431519075
CF O D P 142328437025335610043
MF242832446666109216429113
MF 1 3840434961112185369737182
MF M F 2025294063579318436798
MF C F 1523315191365711121770
MF O D P 1524304560365711121766
C 01 :C 10 (20% Noise Ratio)
CF2664102198389395286152123
CF 1 233649821475587167327108
CF M F 245791175342375186155113
CF C F 5014824649097954576480241
CF O D P 213649821473849658163
MF2740528314670112218431131
MF 1 697581961262003316601316328
MF M F 223039621065690176347103
MF C F 235588170333354779141108
MF O D P 1729406811734477914164
C 01 :C 10 (30% Noise Ratio)
CF501332174268446682123204238
CF 1 2848691212246296183355132
CF M F 431161903737405668100163205
CF C F 9026744588817759295101113429
CF O D P 284869121224527110011191
MF426485139247105169326642202
MF 1 8591971101382514168291656408
MF M F 2844599717469110213419135
MF C F 451211963867655972107176214
MF O D P 26435790147517610717686
Table 5. Cost comparisons on Wpbc.
Table 5. Cost comparisons on Wpbc.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF223956981824976143277105
CF 1 4550556894131216429856216
CF M F 19344986160416412123490
CF C F 1440661312611619243567
CF O D P 14355265911619243539
MF3945516798110182361719186
MF 1 80828387952393987951590383
MF M F 3640455678103171339676171
MF C F 18355396183375610319887
MF O D P 1835456390375610319872
C 01 :C 10 (20% Noise Ratio)
CF32671021903666190162306153
CF 1 506886131221131212416822237
CF M F 28651011923744868117216134
CF C F 277913026051929313646128
CF O D P 2660821312213033364674
MF456584132229117188366723217
MF 1 7884901041332293807571512374
MF M F 426079126219106170332654199
MF C F 2767107207407425794169131
MF O D P 28597710213242579416985
C 01 :C 10 (30% Noise Ratio)
CF40951502865606793158290193
CF 1 537392141239140226443876254
CF M F 38971573066035369109188180
CF C F 3911619438777439404143186
CF O D P 3777921412393940414383
MF5080110185335120190365715239
MF 1 757983941152203657281453357
MF M F 4874101168301116185357700228
MF C F 391021653246405266101169184
MF O D P 39758294115526610116988
Table 6. Cost comparisons on Spect.
Table 6. Cost comparisons on Spect.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF233956981815179151293108
CF 1 39506086139108176347690188
CF M F 193755100190406111321893
CF C F 1642681332632228437376
CF O D P 164054931392228437356
MF42505878118117192380755199
MF 1 717478871042083466901377337
MF M F 36435170108100164325646171
MF C F 20375498184426412023194
MF O D P 2039516992426412023181
C 01 :C 10 (20% Noise Ratio)
CF34711082013866595172325162
CF 1 486990144250122196381751228
CF M F 31691072033945579138257148
CF C F 319014829558834384663148
CF O D P 3268951442503638466386
MF456381125214119192374740217
MF 1 7783891041342263757471492370
MF M F 426077121209108175341673201
MF C F 30721132164235070118216145
MF O D P 316078108134507011821696
C 01 :C 10 (30% Noise Ratio)
CF4911317733665583117203374234
CF 1 5989118193342147234454893281
CF M F 451091743356577196160288215
CF C F 5114824649097855596989243
CF O D P 468911819334255596989118
MF5689122203367136216416816269
MF 1 891001111371912584268461688427
MF M F 5488123208379128203388759259
MF C F 451151863617126586136237216
MF O D P 45861091371916586136237121
Table 7. Cost comparisons on Spect1.
Table 7. Cost comparisons on Spect1.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF28456210418967107205402134
CF 1 50576482117142234464924237
CF M F 2341601061985077145280109
CF C F 1948781533012633528889
CF O D P 194259851172633528858
MF47566589135131215426847223
MF 1 828487951092424028031604390
MF M F 41505980124115189373742197
MF C F 2342601061985179148287110
MF O D P 23436083120517914828799
C 01 :C 10 (20% Noise Ratio)
CF35761162174196595170320168
CF 1 516987131220136221433858245
CF M F 34781222324525780138253160
CF C F 349816232364437404863161
CF O D P 3366881312203740486381
MF537597152262137222432853254
MF 1 8692991141452524198341665412
MF M F 476889142247119191371731222
MF C F 32771232374655068114205152
MF O D P 326585112143506811420597
C 01 :C 10 (30% Noise Ratio)
CF5312519837974187120204372253
CF 1 711011312053541842965771139340
CF M F 491282074058006886133226234
CF C F 53158263524104855576170254
CF O D P 5110413120535454566270121
MF66105143240433159252485950315
MF 1 1021091151311643014999941985489
MF M F 6199136229416147232445872293
MF C F 521362214328547190138233247
MF O D P 521031251311647190138233123
Table 8. Cost comparisons on Promoter.
Table 8. Cost comparisons on Promoter.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF101929529920315710947
CF 1 232629385465107212422108
CF M F 81929541061522387240
CF C F 7223773147777735
CF O D P 819293754877720
MF2024294164559017735194
MF 1 4344444649128213426851205
MF M F 1720233045488015831582
MF C F 81829551071318315637
MF O D P 8182332411318315627
C 01 :C 10 (20% Noise Ratio)
CF1634519618530447914976
CF 1 263239548572118233463125
CF M F 1333531011992128468364
CF C F 1544721452891516171970
CF O D P 14344054851516171933
MF2432416310663102199393114
MF 1 4243444752124207413825200
MF M F 223140621075894183362107
MF C F 1435561082122027447766
MF O D P 14294446502027447739
C 01 :C 10 (30% Noise Ratio)
CF224976144279385495176104
CF 1 3244578815083133261515151
CF M F 20528416432429375810097
CF C F 216410621242421212121101
CF O D P 214457881502121212149
MF2743589817765103198388128
MF 1 4145485673120199397793197
MF M F 243852881595790174340114
MF C F 2054881733432530436893
MF O D P 20404856732530436845
Table 9. Cost comparisons on Heart in the case that the noise ratio and noise-free dataset are unavailable.
Table 9. Cost comparisons on Heart in the case that the noise ratio and noise-free dataset are unavailable.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF233750821485791175344112
CF 1 343945578398161320638164
CF M F 213447791445180154302101
CF C F 1639621192332433559875
CF O D P 2635415683516910618572
MF3642486393101166330656170
MF 1 61646773871813016001199293
MF M F 313641558187143283563147
MF C F 21324473131518115630699
MF O D P 2333415785538516030694
C 01 :C 10 (20% Noise Ratio)
CF2857861583025582149283133
CF 1 40516291147107175345683189
CF M F 2860921723325378139263135
CF C F 308514128055834384868142
CF O D P 295266961465369819877
MF39526598164104169331655186
MF 1 677277881121973276511300321
MF M F 3648609015097158309613174
MF C F 2760931743385072127239131
MF O D P 30486084119557812723993
C 01 :C 10 (30% Noise Ratio)
CF431021623126106893157283203
CF 1 456891147261114182352694217
CF M F 40991583076035979129227189
CF C F 4814323947695250515563231
CF O D P 38769914726156658084101
MF5179108178319125199384754244
MF 1 7683911081442223677311459365
MF M F 436995160290103162312611205
MF C F 401031673266445672112192190
MF O D P 3775961271785978112192106
Table 10. Cost comparisons on Wdbc in the case that the noise ratio and noise-free dataset are unavailable.
Table 10. Cost comparisons on Wdbc in the case that the noise ratio and noise-free dataset are unavailable.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF1732478616235539919080
CF 1 21293653895793182361102
CF M F 1629427514034529818975
CF C F 1950811593152734528992
CF O D P 172735528837558915462
MF222834497961100196390107
MF 1 3841445268111184367733182
MF M F 2025304368548817434594
MF C F 16263764117365710921275
MF O D P 1625314367365710921266
C 01 :C 10 (20% Noise Ratio)
CF2664102197387415693168126
CF 1 223446751345486167328105
CF M F 225283159311354881146104
CF C F 4713822845490651556585226
CF O D P 2135467513437498012166
MF2839507813372116226446132
MF 1 606469811041742895761149285
MF M F 223139591005995186369107
MF C F 21487414227635508615899
MF O D P 213040619640548615865
C 01 :C 10 (30% Noise Ratio)
CF561522484899707186125201266
CF 1 32608815829867103192370152
CF M F 491342194318566377112182236
CF C F 922744569111821939599107438
CF O D P 3460881582985979119137115
MF4678109188346106167318620220
MF 1 961021091251572824679311859459
MF M F 30496711320572114219429144
MF C F 481312144228386174107173230
MF O D P 305068103166567411217392
Table 11. Cost comparisons on Wpbc in the case that the noise ratio and noise-free dataset are unavailable.
Table 11. Cost comparisons on Wpbc in the case that the noise ratio and noise-free dataset are unavailable.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF2340581021895078148287108
CF 1 43515978117121199395786205
CF M F 2139571031944468127245100
CF C F 1540661312601822314870
CF O D P 234257791183038536456
MF41495674111117192380757197
MF 1 79808185912353917811561376
MF M F 38465473112107176348693183
MF C F 193856102194395910820791
MF O D P 22425173101405910820778
C 01 :C 10 (20% Noise Ratio)
CF3063951753365987158301145
CF 1 526579113181142232457907247
CF M F 29661021933755173128238140
CF C F 267712825651228293239125
CF O D P 3063861231814242528178
MF456177116195121196383759217
MF 1 818589991192383967901578386
MF M F 445974112188118191374741211
MF C F 27641021953824461103188130
MF O D P 306183109169466410318895
C 01 :C 10 (30% Noise Ratio)
CF39971542975846182135242188
CF 1 5381109178317132210406798254
CF M F 37961553015955369108187178
CF C F 4011719438877441434856189
CF O D P 388712520833345475560111
MF5182113191346121192368720243
MF 1 7480861011312173607161430355
MF M F 4980110187340116183351686234
MF C F 381011643206345266101171183
MF O D P 39851141802405368101171117
Table 12. Cost comparisons on Spect in the case that the noise ratio and noise-free dataset are unavailable.
Table 12. Cost comparisons on Spect in the case that the noise ratio and noise-free dataset are unavailable.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF19344987163426512223691
CF 1 37455475117102168331658176
CF M F 21375393173467213626499
CF C F 1743701372702431488280
CF O D P 2038517611733436410060
MF40485677118110181359713189
MF 1 69727581952053416811361331
MF M F 36455374116100164323642173
MF C F 20365291170446812925096
MF O D P 20385175100446812925086
C 01 :C 10 (20% Noise Ratio)
CF35751152154156494168316166
CF 1 486990143249123198385759229
CF M F 31711112124135375129238148
CF C F 329215230260235394866152
CF O D P 3269951492603842486589
MF456686138241115184358706216
MF 1 7886931121502263757461488373
MF M F 446177119202115186364719210
MF C F 32741162214315374128234151
MF O D P 3262821121645374128234105
C 01 :C 10 (30% Noise Ratio)
CF5111317533164290129227422242
CF 1 6187112175302159256499986293
CF M F 4510516631962373101172313213
CF C F 5115025049899553566274243
CF O D P 478811817530267646784113
MF6191122198350153244473931291
MF 1 92971011131362714518991796440
MF M F 5079108180324120191368721238
MF C F 451171893697296380125214214
MF O D P 44811101361926480125214116
Table 13. Cost comparisons on Spect1 in the case that the noise ratio and noise-free dataset are unavailable.
Table 13. Cost comparisons on Spect1 in the case that the noise ratio and noise-free dataset are unavailable.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF2543601031905993176344122
CF 1 4551567098130216428854217
CF M F 2443611092035281152294113
CF C F 1745731442852329437281
CF O D P 274556711014147639661
MF45536182123126207410815213
MF 1 848689961092484138251648400
MF M F 41495777117115190375746196
MF C F 2341591041945077145280108
MF O D P 28455880109517915028098
C 01 :C 10 (20% Noise Ratio)
CF388312723946269100178333181
CF 1 527089134225138224439869249
CF M F 35791232344555984147271165
CF C F 3510216933667038414965167
CF O D P 3670971342255563676891
MF5578102161278140226441870261
MF 1 91961021161432664428821761433
MF M F 466687138240117189367724219
MF C F 34791242364605780137251162
MF O D P 3668941361975980137251118
C 01 :C 10 (30% Noise Ratio)
CF5713421140579193129220401271
CF 1 681051412324141682685171016325
CF M F 5112820539778176101163287243
CF C F 53158262522104456596579255
CF O D P 5110514923241462626579136
MF65106147249454155245470920313
MF 1 961041131331742804649241844459
MF M F 5893127213385140222427837278
MF C F 491292094098096784129218234
MF O D P 481021311902546784129218136
Table 14. Cost comparisons on Promoter in the case that the noise ratio and noise-free dataset are unavailable.
Table 14. Cost comparisons on Promoter in the case that the noise ratio and noise-free dataset are unavailable.
C 01 :C 10 (10% Noise Ratio)
1:11:31:51:101:203:15:110:120:1Ave.
CF91929531021725468743
CF 1 222529385662102203404105
CF M F 7152447911115264731
CF C F 7213469138777733
CF O D P 716244168888921
MF1824294371497915630987
MF 1 4344454955127211422843204
MF M F 1721253657467514829380
MF C F 7162548931115264732
MF O D P 7152235531115264726
C 01 :C 10 (20% Noise Ratio)
CF15345310119726376411872
CF 1 273442599575122241479131
CF M F 1333531022011925417262
CF C F 1442711412821414141467
CF O D P 133043651001514141434
MF22303857965894185366105
MF 1 4244465161123204408814199
MF M F 20293758101538516532697
MF C F 1333531022012026427463
MF O D P 14283950722026427441
C 01 :C 10 (30% Noise Ratio)
CF21497714828934478014599
CF 1 2840517913673118231457135
CF M F 1950821603172632498291
CF C F 216410621242522222223102
CF O D P 214461841372322222349
MF2742589617365104200392128
MF 1 3840424655113187373745182
MF M F 254056941716094181354119
MF C F 2155881733422936569599
MF O D P 21465583932936569557

Share and Cite

MDPI and ACS Style

Guan, D.; Hussain, M.; Yuan, W.; Khattak, A.M.; Fahim, M.; Khan, W.A. Enhanced Label Noise Filtering with Multiple Voting. Appl. Sci. 2019, 9, 5031. https://doi.org/10.3390/app9235031

AMA Style

Guan D, Hussain M, Yuan W, Khattak AM, Fahim M, Khan WA. Enhanced Label Noise Filtering with Multiple Voting. Applied Sciences. 2019; 9(23):5031. https://doi.org/10.3390/app9235031

Chicago/Turabian Style

Guan, Donghai, Maqbool Hussain, Weiwei Yuan, Asad Masood Khattak, Muhammad Fahim, and Wajahat Ali Khan. 2019. "Enhanced Label Noise Filtering with Multiple Voting" Applied Sciences 9, no. 23: 5031. https://doi.org/10.3390/app9235031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop