Noise Models in Classification: Unified Nomenclature, Extended Taxonomy and Pragmatic Categorization
Abstract
:1. Introduction
- 1.
- Presentation of the first review of noise models for classification, including label noise, attribute noise and both in combination.
- 2.
- Analysis of the structure of noise models, which is usually overlooked in the literature, identifying the fundamental components that allow their characterization.
- 3.
- Detection of the absence and lack of uniformity in the nomenclature of noise models in the literature.
- 4.
- Proposal of nomenclature to name noise models in a descriptive way, referring to their main structural components.
- 5.
- Unification of existing taxonomies in the literature and updates to better reflect the types of noise models and their characteristics.
- 6.
- Categorization of noise models from a practical point of view, depending on the characteristics of noise and the available knowledge of the problem domain.
2. Background
2.1. Need for a Unified Nomenclature
- 1.
- Many models are not assigned an identifying name,
- 2.
- There are discrepancies when naming known models.
2.2. Current Taxonomies
- 1.
- Noisy completely at random (NCAR). These are the simplest models to introduce noise. In NCAR models, the probability of mislabeling a sample does not depend on the information in classes or attributes [38].
- 2.
- Noisy at random (NAR). In NAR models, the mislabeling probability of a sample depends on its class label . This type of model allows considering a different noise level for each of the classes in the dataset. NAR is usually modeled by means of the transition matrix [56] (Equation (1)), where () is the probability of a sample with label to be mislabeled as :
- 3.
- Noisy not at random (NNAR). These use both the information in class labels and attribute values () to determine mislabeled samples. They constitute a more realistic scenario consisting of mislabeling samples in specific areas, such as decision boundaries, where classes share similar characteristics [22].
- 1.
- Affected variables. This divides the noise models according to whether they introduce label noise, attribute noise or their combination.
- 2.
- Error distribution. This classifies the models by considering whether the introduced errors follow some known probability distribution, such as Gaussian.
- 3.
- Magnitude of errors. This divides the models according to whether the magnitude of the generated errors is relative to the values of each sample or to the minimum, maximum or standard deviation of each variable.
3. A New Unified Nomenclature for Noise Models
- 1.
- 2.
- Selection procedure. Let be the set of indices of samples and be the set of indices of variables (output class label and input attributes) in a dataset D to be corrupted. The selection procedure creates a set of pairs with the values to be altered. An element indicates that the variable in the sample (that is, the value ) must be corrupted. Note that for label noise models, for attribute noise models and for combined noise models.The set can be seen as the indices of the samples to corrupt for each variable () in which noise is introduced. There are multiple ways to select such samples [20,69,70]. For example, all samples can have the same probability of being chosen [38,71], a different probability can be defined for the samples according to their class label [20,72], the probability of choosing a sample can depend on its proximity to the decision boundaries [69,70,73], the samples in a certain area of the domain can be altered [16,74] or even all the samples in the dataset can be selected to be corrupted [67].
- 3.
- Disruption procedure. Given the set indicating the samples to be corrupted for each noisy variable (), this procedure allows altering their original values by changing them to new noisy ones . As in the case of the selection procedure, there are different alternatives to modify the original values by the disruption procedure [24,71,75]. For example, a new value within the domain can be chosen including the original value [26,76] or excluding it [38,77], a default value can be chosen dependently from the original value [75,78,79] or independently [71,72], additive noise following a Gaussian distribution can be considered [17,67], among others [80,81].
4. Proposal for an Extended Taxonomy of Noise Models
- Noise type (Section 4.1). This classifies the models based on the introduction of label noise, attribute noise or both.
- Selection source (Section 4.2). This categorizes the models according to the information sources (class and/or attributes) used by the selection procedure.
- Disruption source (Section 4.3). This is similar to the selection source but aimed at the disruption procedure.
- Selection distribution (Section 4.4). The main probability distribution, if any, underlying the selection procedure.
- Disruption distribution (Section 4.5). The main probability distribution related to the disruption procedure.
4.1. Noise Type
4.2. Selection Source
- 1.
- Noisy completely at random (NCAR). These are models that do not consider the information of labels or attributes to select the noisy samples. For example, Symmetric uniform label noise [38] corrupts the class labels in the dataset assigning to each sample the same probability of being altered, whereas Unconditional vp-Gaussian attribute noise [67] corrupts all the samples.
- 2.
- Noisy at random (NAR). These are models whose selection procedure uses the same information source as the noise type they introduce. Therefore, NAR label [33,35] and attribute [58,82] noise models, respectively, use class information () and attribute information ( or with ) to determine the samples to corrupt. For example, Asymmetric uniform label noise [20] and Asymmetric uniform attribute noise [82] consider a different noise probability for each class label and attribute , respectively.
- 3.
- Noisy partially at random (NPAR). These are models whose selection procedure uses the opposite information source to the noise type they introduce. Examples of this type of selection source are found in Attribute-mean uniform label noise [86], which gives a higher probability of corrupting the samples whose attributes are closer to the mean values, or in the aforementioned Quadrant-based uniform label noise [62].
- 4.
- Noisy not at random (NNAR). The selection procedure of these models uses class and attribute information to determine the noisy samples. For example, Neighborwise borderline label noise [15] determines the samples to corrupt by computing a noise score for each sample as a function of its distances to its nearest neighbors (using the information from attributes) of the same and different class (using the information from class labels).
4.3. Disruption Source
- 1.
- Disruption completely at random (DCAR). These are models whose disruption procedure does not use information from class labels or attribute values. For example, Symmetric default label noise [71] always chooses the same class label as the noisy value regardless of the class and attribute values of each sample, whereas Symmetric completely-uniform label noise [44] chooses the value of the noisy label uniformly among all the possibilities in the domain.
- 2.
- Disruption at random (DAR). These are models whose disruption procedure uses only the same information source as the noise type they introduce. For example, Symmetric adjacent label noise [89] chooses one of the classes adjacent to that of the sample to corrupt, whereas Symmetric Gaussian attribute noise [17] adds another one to the value of the original attribute that follows a zero-mean Gaussian distribution.
- 3.
- Disruption partially at random (DPAR). These are models whose disruption procedure uses the opposite information source to the noise type they introduce. Even though this characteristic is not usually considered in the literature, it is interesting to define it for potential noise models.
- 4.
- Disruption not at random (DNAR). These are models whose disruption procedure uses class and attribute information. For example, in Symmetric nearest-neighbor label noise [41], the noisy label for each sample is taken from its closest sample of a different class. Therefore, it uses class and attribute information to determine the new noisy labels.
4.4. Selection Distribution
- Symmetric [44,71]. Each sample follows a Bernoulli distribution of parameter to be corrupted. Thus, a multivariate Bernoulli distribution of parameter on is followed by the n samples of all the classes/attributes (according to the type of noise introduced), which therefore have the same probability of being selected.
4.5. Disruption Distribution
5. Label Noise Models
6. Attribute Noise Models
- if , ;
- if , ;
- if , one of the above options is chosen;
7. Conclusions and Future Directions
Funding
Conflicts of Interest
References
- Yu, Z.; Wang, D.; Zhao, Z.; Chen, C.L.P.; You, J.; Wong, H.; Zhang, J. Hybrid incremental ensemble learning for noisy real-world data classification. IEEE Trans. Cybern. 2019, 49, 403–416. [Google Scholar] [CrossRef] [PubMed]
- Gupta, S.; Gupta, A. Dealing with noise problem in machine learning data-sets: A systematic review. Procedia Comput. Sci. 2019, 161, 466–474. [Google Scholar] [CrossRef]
- Martín, J.; Sáez, J.A.; Corchado, E. On the regressand noise problem: Model robustness and synergy with regression-adapted noise filters. IEEE Access 2021, 9, 145800–145816. [Google Scholar] [CrossRef]
- Liu, T.; Tao, D. Classification with noisy labels by importance reweighting. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 447–461. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xia, S.; Liu, Y.; Ding, X.; Wang, G.; Yu, H.; Luo, Y. Granular ball computing classifiers for efficient, scalable and robust learning. Inf. Sci. 2019, 483, 136–152. [Google Scholar] [CrossRef]
- Nematzadeh, Z.; Ibrahim, R.; Selamat, A. Improving class noise detection and classification performance: A new two-filter CNDC model. Appl. Soft Comput. 2020, 94, 106428. [Google Scholar] [CrossRef]
- Zeng, S.; Duan, X.; Li, H.; Xiao, Z.; Wang, Z.; Feng, D. Regularized fuzzy discriminant analysis for hyperspectral image classification with noisy labels. IEEE Access 2019, 7, 108125–108136. [Google Scholar] [CrossRef]
- Sáez, J.A.; Corchado, E. ANCES: A novel method to repair attribute noise in classification problems. Pattern Recognit. 2022, 121, 108198. [Google Scholar] [CrossRef]
- Adeli, E.; Thung, K.; An, L.; Wu, G.; Shi, F.; Wang, T.; Shen, D. Semi-supervised discriminative classification robust to sample-outliers and feature-noises. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 515–522. [Google Scholar] [CrossRef]
- Tian, Y.; Sun, M.; Deng, Z.; Luo, J.; Li, Y. A new fuzzy set and nonkernel SVM approach for mislabeled binary classification with applications. IEEE Trans. Fuzzy Syst. 2017, 25, 1536–1545. [Google Scholar] [CrossRef]
- Yu, Z.; Lan, K.; Liu, Z.; Han, G. Progressive ensemble kernel-based broad learning system for noisy data classification. IEEE Trans. Cybern. 2022, 52, 9656–9669. [Google Scholar] [CrossRef] [PubMed]
- Xia, S.; Zheng, S.; Wang, G.; Gao, X.; Wang, B. Granular ball sampling for noisy label classification or imbalanced classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, in press. [Google Scholar] [CrossRef] [PubMed]
- Xia, S.; Zheng, Y.; Wang, G.; He, P.; Li, H.; Chen, Z. Random space division sampling for label-noisy classification or imbalanced classification. IEEE Trans. Cybern. 2021, in press. [Google Scholar] [CrossRef] [PubMed]
- Huang, L.; Shao, Y.; Zhang, J.; Zhao, Y.; Teng, J. Robust rescaled hinge loss twin support vector machine for imbalanced noisy classification. IEEE Access 2019, 7, 65390–65404. [Google Scholar] [CrossRef]
- Garcia, L.P.F.; Lehmann, J.; de Carvalho, A.C.P.L.F.; Lorena, A.C. New label noise injection methods for the evaluation of noise filters. Knowl.-Based Syst. 2019, 163, 693–704. [Google Scholar] [CrossRef]
- Tomasev, N.; Buza, K. Hubness-aware kNN classification of high-dimensional data in presence of label noise. Neurocomputing 2015, 160, 157–172. [Google Scholar] [CrossRef] [Green Version]
- Sáez, J.A.; Galar, M.; Luengo, J.; Herrera, F. Analyzing the presence of noise in multi-class problems: Alleviating its influence with the One-vs-One decomposition. Knowl. Inf. Syst. 2014, 38, 179–206. [Google Scholar] [CrossRef]
- Frénay, B.; Verleysen, M. Classification in the presence of label noise: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 845–869. [Google Scholar] [CrossRef]
- Nettleton, D.; Orriols-Puig, A.; Fornells, A. A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 2010, 33, 275–306. [Google Scholar] [CrossRef]
- Zhao, Z.; Chu, L.; Tao, D.; Pei, J. Classification with label noise: A Markov chain sampling framework. Data Min. Knowl. Discov. 2019, 33, 1468–1504. [Google Scholar] [CrossRef]
- Li, J.; Zhu, Q.; Wu, Q.; Zhang, Z.; Gong, Y.; He, Z.; Zhu, F. SMOTE-NaN-DE: Addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution. Knowl.-Based Syst. 2021, 223, 107056. [Google Scholar] [CrossRef]
- Bootkrajang, J.; Chaijaruwanich, J. Towards instance-dependent label noise-tolerant classification: A probabilistic approach. Pattern Anal. Appl. 2020, 23, 95–111. [Google Scholar] [CrossRef]
- Hendrycks, D.; Mazeika, M.; Wilson, D.; Gimpel, K. Using trusted data to train deep networks on labels corrupted by severe noise. Adv. Neural Inf. Process. Syst. 2018, 31, 10477–10486. [Google Scholar]
- Shanthini, A.; Vinodhini, G.; Chandrasekaran, R.M.; Supraja, P. A taxonomy on impact of label noise and feature noise using machine learning techniques. Soft Comput. 2019, 23, 8597–8607. [Google Scholar] [CrossRef]
- Koziarski, M.; Krawczyk, B.; Wozniak, M. Radial-based oversampling for noisy imbalanced data classification. Neurocomputing 2019, 343, 19–33. [Google Scholar] [CrossRef]
- Teng, C. Polishing blemishes: Issues in data correction. IEEE Intell. Syst. 2004, 19, 34–39. [Google Scholar] [CrossRef]
- Kazmierczak, S.; Mandziuk, J. A committee of convolutional neural networks for image classification in the concurrent presence of feature and label noise. In Proceedings of the 16th International Conference on Parallel Problem Solving from Nature, Leiden, The Netherlands, 5–9 September 2020; Volume 12269; LNCS, pp. 498–511. [Google Scholar]
- Mirzasoleiman, B.; Cao, K.; Leskovec, J. Coresets for robust training of deep neural networks against noisy labels. Adv. Neural Inf. Process. Syst. 2020, 33, 11465–11477. [Google Scholar]
- Kang, J.; Fernandez-Beltran, R.; Kang, X.; Ni, J.; Plaza, A. Noise-tolerant deep neighborhood embedding for remotely sensed images with label noise. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2551–2562. [Google Scholar] [CrossRef]
- Koziarski, M.; Wozniak, M.; Krawczyk, B. Combined cleaning and resampling algorithm for multi-class imbalanced data with label noise. Knowl.-Based Syst. 2020, 204, 106223. [Google Scholar] [CrossRef]
- Xia, S.; Wang, G.; Chen, Z.; Duan, Y.; Liu, Q. Complete random forest based class noise filtering learning for improving the generalizability of classifiers. IEEE Trans. Knowl. Data Eng. 2019, 31, 2063–2078. [Google Scholar] [CrossRef]
- Zhang, W.; Wang, D.; Tan, X. Robust class-specific autoencoder for data cleaning and classification in the presence of label noise. Neural Process. Lett. 2019, 50, 1845–1860. [Google Scholar] [CrossRef]
- Chen, B.; Xia, S.; Chen, Z.; Wang, B.; Wang, G. RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise. Inf. Sci. 2021, 553, 397–428. [Google Scholar] [CrossRef]
- Pakrashi, A.; Namee, B.M. KalmanTune: A Kalman filter based tuning method to make boosted ensembles robust to class-label noise. IEEE Access 2020, 8, 145887–145897. [Google Scholar] [CrossRef]
- Salekshahrezaee, Z.; Leevy, J.L.; Khoshgoftaar, T.M. A reconstruction error-based framework for label noise detection. J. Big Data 2021, 8, 1–16. [Google Scholar] [CrossRef]
- Abellán, J.; Mantas, C.J.; Castellano, J.G. AdaptativeCC4.5: Credal C4.5 with a rough class noise estimator. Expert Syst. Appl. 2018, 92, 363–379. [Google Scholar] [CrossRef]
- Wang, C.; Shi, J.; Zhou, Y.; Li, L.; Yang, X.; Zhang, T.; Wei, S.; Zhang, X.; Tao, C. Label noise modeling and correction via loss curve fitting for SAR ATR. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–10. [Google Scholar] [CrossRef]
- Wei, Y.; Gong, C.; Chen, S.; Liu, T.; Yang, J.; Tao, D. Harnessing side information for classification under label noise. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 3178–3192. [Google Scholar] [CrossRef]
- Chen, P.; Liao, B.; Chen, G.; Zhang, S. Understanding and utilizing deep neural networks trained with noisy labels. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97; PMLR, pp. 1062–1070. [Google Scholar]
- Song, H.; Kim, M.; Lee, J.G. SELFIE: Refurbishing unclean samples for robust deep learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97; PMLR, pp. 5907–5915. [Google Scholar]
- Seo, P.H.; Kim, G.; Han, B. Combinatorial inference against label noise. Adv. Neural Inf. Process. Syst. 2019, 32, 1171–1181. [Google Scholar]
- Wu, P.; Zheng, S.; Goswami, M.; Metaxas, D.N.; Chen, C. A topological filter for learning with label noise. Adv. Neural Inf. Process. Syst. 2020, 33, 21382–21393. [Google Scholar]
- Cheng, J.; Liu, T.; Ramamohanarao, K.; Tao, D. Learning with bounded instance and label-dependent label noise. In Proceedings of the 37th International Conference on Machine Learning, virtual, 3–18 July 2020; Volume 119; PMLR, pp. 1789–1799. [Google Scholar]
- Ghosh, A.; Lan, A.S. Contrastive learning improves model robustness under label noise. In Proceedings of the 2021 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Virtual, 19–25 July 2021; pp. 2703–2708. [Google Scholar]
- Wang, Z.; Hu, G.; Hu, Q. Training noise-robust deep neural networks via meta-learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, virtual, 14–19 June 2020; pp. 4523–4532. [Google Scholar]
- Jindal, I.; Pressel, D.; Lester, B.; Nokleby, M.S. An effective label noise model for DNN text classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MA, USA, 9–14 June 2019; pp. 3246–3256. [Google Scholar]
- Scott, C.; Blanchard, G.; Handy, G. Classification with asymmetric label noise: Consistency and maximal denoising. In Proceedings of the 26th Annual Conference on Learning Theory, Princeton, NJ, USA, 12–14 June 2013; Volume 30; JMLR, pp. 489–511. [Google Scholar]
- Yang, P.; Ormerod, J.; Liu, W.; Ma, C.; Zomaya, A.; Yang, J. AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications. IEEE Trans. Cybern. 2019, 49, 1932–1943. [Google Scholar] [CrossRef]
- Feng, L.; Shu, S.; Lin, Z.; Lv, F.; Li, L.; An, B. Can cross entropy loss be robust to label noise? In Proceedings of the 29th International Joint Conference on Artificial Intelligence, Yokohama, Japan, 11–17 July 2020; pp. 2206–2212. [Google Scholar]
- Ghosh, A.; Kumar, H.; Sastry, P. Robust loss functions under label noise for deep neural networks. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 1919–1925. [Google Scholar]
- Tanaka, D.; Ikami, D.; Yamasaki, T.; Aizawa, K. Joint optimization framework for learning with noisy labels. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5552–5560. [Google Scholar]
- Sun, Z.; Liu, H.; Wang, Q.; Zhou, T.; Wu, Q.; Tang, Z. Co-LDL: A co-training-based label distribution learning method for tackling label noise. IEEE Trans. Multimed. 2022, 24, 1093–1104. [Google Scholar] [CrossRef]
- Li, J.; Wong, Y.; Zhao, Q.; Kankanhalli, M.S. Learning to learn from noisy labeled data. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5051–5059. [Google Scholar]
- Harutyunyan, H.; Reing, K.; Steeg, G.V.; Galstyan, A. Improving generalization by controlling label-noise information in neural network weights. In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020; Volume 119; PMLR, pp. 4071–4081. [Google Scholar]
- Han, B.; Yao, Q.; Yu, X.; Niu, G.; Xu, M.; Hu, W.; Tsang, I.W.; Sugiyama, M. Co-teaching: Robust training of deep neural networks with extremely noisy labels. Adv. Neural Inf. Process. Syst. 2018, 31, 8536–8546. [Google Scholar]
- Nikolaidis, K.; Plagemann, T.; Kristiansen, S.; Goebel, V.; Kankanhalli, M. Using under-trained deep ensembles to learn under extreme label noise: A case study for sleep apnea detection. IEEE Access 2021, 9, 45919–45934. [Google Scholar] [CrossRef]
- Bootkrajang, J.; Kabán, A. Learning kernel logistic regression in the presence of class label noise. Pattern Recognit. 2014, 47, 3641–3655. [Google Scholar] [CrossRef] [Green Version]
- Mannino, M.V.; Yang, Y.; Ryu, Y. Classification algorithm sensitivity to training data with non representative attribute noise. Decis. Support Syst. 2009, 46, 743–751. [Google Scholar] [CrossRef]
- Ghosh, A.; Manwani, N.; Sastry, P.S. On the robustness of decision tree learning under label noise. In Proceedings of the 21th Conference on Advances in Knowledge Discovery and Data Mining, Jeju, Korea, 23–26 May 2017; Volume 10234; LNCS, pp. 685–697. [Google Scholar]
- Arazo, E.; Ortego, D.; Albert, P.; O’Connor, N.E.; McGuinness, K. Unsupervised label noise modeling and loss correction. In Proceedings of the 36th International Conference on Machine Learning, Beach, CA, USA, 9–15 June 2019; Volume 97; PMLR, pp. 312–321. [Google Scholar]
- Liu, D.; Yang, G.; Wu, J.; Zhao, J.; Lv, F. Robust binary loss for multi-category classification with label noise. In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, 6–11 June 2021; pp. 1700–1704. [Google Scholar]
- Ghosh, A.; Manwani, N.; Sastry, P.S. Making risk minimization tolerant to label noise. Neurocomputing 2015, 160, 93–107. [Google Scholar] [CrossRef] [Green Version]
- Ortego, D.; Arazo, E.; Albert, P.; O’Connor, N.E.; McGuinness, K. Towards robust learning with different label noise distributions. In Proceedings of the 25th International Conference on Pattern Recognition, Milan, Italy, 10–15 January 2020; pp. 7020–7027. [Google Scholar]
- Fatras, K.; Damodaran, B.; Lobry, S.; Flamary, R.; Tuia, D.; Courty, N. Wasserstein adversarial regularization for learning with label noise. IEEE Trans. Pattern Anal. Mach. Intell. 2021, in press. [Google Scholar] [CrossRef]
- Qin, Z.; Zhang, Z.; Li, Y.; Guo, J. Making deep neural networks robust to label noise: Cross-training with a novel loss function. IEEE Access 2019, 7, 130893–130902. [Google Scholar] [CrossRef]
- Schneider, J.; Handali, J.P.; vom Brocke, J. Increasing trust in (big) data analytics. In Proceedings of the 2018 Advanced Information Systems Engineering Workshops, Tallinn, Estonia, 11–15 June 2018; Volume 316; LNBIP, pp. 70–84. [Google Scholar]
- Huang, X.; Shi, L.; Suykens, J.A.K. Support vector machine classifier with pinball loss. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 984–997. [Google Scholar] [CrossRef]
- Bi, J.; Zhang, T. Support vector classification with input data uncertainty. Adv. Neural Inf. Process. Syst. 2004, 17, 161–168. [Google Scholar]
- Bootkrajang, J. A generalised label noise model for classification in the presence of annotation errors. Neurocomputing 2016, 192, 61–71. [Google Scholar] [CrossRef]
- Bootkrajang, J. A generalised label noise model for classification. In Proceedings of the 23rd European Symposium on Artificial Neural Networks, Bruges, Belgium, 22–23 April 2015; pp. 349–354. [Google Scholar]
- Ren, M.; Zeng, W.; Yang, B.; Urtasun, R. Learning to reweight examples for robust deep learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm Sweden, 10–15 July 2018; Volume 80; PMLR, pp. 4331–4340. [Google Scholar]
- Prati, R.C.; Luengo, J.; Herrera, F. Emerging topics and challenges of learning from noisy data in nonstandard classification: A survey beyond binary class noise. Knowl. Inf. Syst. 2019, 60, 63–97. [Google Scholar] [CrossRef]
- Du, J.; Cai, Z. Modelling class noise with symmetric and asymmetric distributions. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2589–2595. [Google Scholar]
- Görnitz, N.; Porbadnigk, A.; Binder, A.; Sannelli, C.; Braun, M.L.; Müller, K.; Kloft, M. Learning and evaluation in presence of non-i. In i.d. label noise. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, Reykjavik, Iceland, 22–25 April 2014; Volume 33; PMLR, pp. 293–302. [Google Scholar]
- Gehlot, S.; Gupta, A.; Gupta, R. A CNN-based unified framework utilizing projection loss in unison with label noise handling for multiple Myeloma cancer diagnosis. Med Image Anal. 2021, 72, 102099. [Google Scholar] [CrossRef] [PubMed]
- Denham, B.; Pears, R.; Naeem, M.A. Null-labelling: A generic approach for learning in the presence of class noise. In Proceedings of the 20th IEEE International Conference on Data Mining, Sorrento, Italy, 17–20 November 2020; pp. 990–995. [Google Scholar]
- Sáez, J.A.; Galar, M.; Luengo, J.; Herrera, F. Tackling the problem of classification with noisy data using multiple classifier systems: Analysis of the performance and robustness. Inf. Sci. 2013, 247, 1–20. [Google Scholar] [CrossRef]
- Khoshgoftaar, T.M.; Hulse, J.V. Empirical case studies in attribute noise detection. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2009, 39, 379–388. [Google Scholar] [CrossRef]
- Kaneko, T.; Ushiku, Y.; Harada, T. Label-noise robust generative adversarial networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2462–2471. [Google Scholar]
- Ghosh, A.; Lan, A.S. Do we really need gold samples for sample weighting under label noise? In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 3921–3930. [Google Scholar]
- Wang, Q.; Han, B.; Liu, T.; Niu, G.; Yang, J.; Gong, C. Tackling instance-dependent label noise via a universal probabilistic model. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; pp. 10183–10191. [Google Scholar]
- Petety, A.; Tripathi, S.; Hemachandra, N. Attribute noise robust binary classification. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 13897–13898. [Google Scholar]
- Zhang, Y.; Zheng, S.; Wu, P.; Goswami, M.; Chen, C. Learning with feature-dependent label noise: A progressive approach. In Proceedings of the 9th International Conference on Learning Representations, Online, 3–7 May 2021; pp. 1–13. [Google Scholar]
- Wei, J.; Liu, Y. When optimizing f-divergence is robust with label noise. In Proceedings of the 9th International Conference on Learning Representations, Online, 3–7 May 2021; pp. 1–11. [Google Scholar]
- Chen, P.; Ye, J.; Chen, G.; Zhao, J.; Heng, P. Beyond class-conditional assumption: A primary attempt to combat instance-dependent label noise. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021; pp. 11442–11450. [Google Scholar]
- Nicholson, B.; Sheng, V.S.; Zhang, J. Label noise correction and application in crowdsourcing. Expert Syst. Appl. 2016, 66, 149–162. [Google Scholar] [CrossRef]
- Amid, E.; Warmuth, M.K.; Srinivasan, S. Two-temperature logistic regression based on the Tsallis divergence. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Naha, Japan, 16 April 2019; Volume 89; PMLR, pp. 2388–2396. [Google Scholar]
- Thulasidasan, S.; Bhattacharya, T.; Bilmes, J.A.; Chennupati, G.; Mohd-Yusof, J. Combating label noise in deep learning using abstention. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97; PMLR, pp. 6234–6243. [Google Scholar]
- Cano, J.R.; Luengo, J.; García, S. Label noise filtering techniques to improve monotonic classification. Neurocomputing 2019, 353, 83–95. [Google Scholar] [CrossRef] [Green Version]
- Pu, X.; Li, C. Probabilistic information-theoretic discriminant analysis for industrial label-noise fault diagnosis. IEEE Trans. Ind. Inform. 2021, 17, 2664–2674. [Google Scholar] [CrossRef]
- Han, B.; Yao, J.; Niu, G.; Zhou, M.; Tsang, I.W.; Zhang, Y.; Sugiyama, M. Masking: A new perspective of noisy supervision. Adv. Neural Inf. Process. Syst. 2018, 31, 5841–5851. [Google Scholar]
- Folleco, A.; Khoshgoftaar, T.M.; Hulse, J.V.; Bullard, L.A. Software quality modeling: The impact of class noise on the random forest classifier. In Proceedings of the 2008 IEEE Congress on Evolutionary Computation, Hong Kong, 1–6 June 2008; pp. 3853–3859. [Google Scholar]
- Zhu, X.; Wu, X. Cost-guided class noise handling for effective cost-sensitive learning. In Proceedings of the 4th IEEE International Conference on Data Mining, Brighton, UK, 1–4 November 2004; pp. 297–304. [Google Scholar]
- Kang, J.; Fernández-Beltran, R.; Duan, P.; Kang, X.; Plaza, A.J. Robust normalized softmax loss for deep metric learning-based characterization of remote sensing images with label noise. IEEE Trans. Geosci. Remote Sens. 2021, 59, 8798–8811. [Google Scholar] [CrossRef]
- Fefilatyev, S.; Shreve, M.; Kramer, K.; Hall, L.O.; Goldgof, D.B.; Kasturi, R.; Daly, K.; Remsen, A.; Bunke, H. Label-noise reduction with support vector machines. In Proceedings of the 21st International Conference on Pattern Recognition, Munich, Germany, 30 July–2 August 2012; pp. 3504–3508. [Google Scholar]
- Huang, L.; Zhang, C.; Zhang, H. Self-adaptive training: Beyond empirical risk minimization. Adv. Neural Inf. Process. Syst. 2020, 33, 19365–19376. [Google Scholar]
- Ramdas, A.; Póczos, B.; Singh, A.; Wasserman, L.A. An analysis of active learning with uniform feature noise. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, Reykjavik, Iceland, 22–25 April 2014; Volume 33; JMLR, pp. 805–813. [Google Scholar]
- Yuan, W.; Guan, D.; Ma, T.; Khattak, A. Classification with class noises through probabilistic sampling. Inf. Fusion 2018, 41, 57–67. [Google Scholar] [CrossRef]
- Ekambaram, R.; Fefilatyev, S.; Shreve, M.; Kramer, K.; Hall, L.; Goldgof, D.; Kasturi, R. Active cleaning of label noise. Pattern Recognit. 2016, 51, 463–480. [Google Scholar] [CrossRef]
- Zhang, T.; Deng, Z.; Ishibuchi, H.; Pang, L. Robust TSK fuzzy system based on semisupervised learning for label noise data. IEEE Trans. Fuzzy Syst. 2021, 29, 2145–2157. [Google Scholar] [CrossRef]
- Berthon, A.; Han, B.; Niu, G.; Liu, T.; Sugiyama, M. Confidence scores make instance-dependent label-noise learning possible. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021; Volume 139; PMLR, pp. 825–836. [Google Scholar]
- Sáez, J.A.; Krawczyk, B.; Woźniak, M. On the influence of class noise in medical data classification: Treatment using noise filtering methods. Appl. Artif. Intell. 2016, 30, 590–609. [Google Scholar] [CrossRef]
- Baldomero-Naranjo, M.; Martínez-Merino, L.; Rodríguez-Chía, A. A robust SVM-based approach with feature selection and outliers detection for classification problems. Expert Syst. Appl. 2021, 178, 115017. [Google Scholar] [CrossRef]
- Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression; John Wiley and Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
- Dombrowski, A.K.; Anders, C.J.; Müller, K.R.; Kessel, P. Towards robust explanations for deep neural networks. Pattern Recognit. 2022, 121, 108194. [Google Scholar] [CrossRef]
- Belarouci, S.; Chikh, M. Medical imbalanced data classification. Adv. Sci. Technol. Eng. Syst. J. 2017, 2, 116–124. [Google Scholar] [CrossRef]
- Bao, F.; Deng, Y.; Kong, Y.; Ren, Z.; Suo, J.; Dai, Q. Learning deep landmarks for imbalanced classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 2691–2704. [Google Scholar] [CrossRef]
- Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning (still) requires rethinking generalization. Commun. ACM 2021, 64, 107–115. [Google Scholar] [CrossRef]
- Pradhan, A.; Mishra, D.; Das, K.; Panda, G.; Kumar, S.; Zymbler, M. On the classification of MR images using “ELM-SSA” coated hybrid model. Mathematics 2021, 9, 2095. [Google Scholar] [CrossRef]
- Iam-On, N. Clustering data with the presence of attribute noise: A study of noise completely at random and ensemble of multiple k-means clusterings. Int. J. Mach. Learn. Cybern. 2020, 11, 491–509. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, H.; Arik, S.Ö.; Lee, H.; Pfister, T. Distilling effective supervision from severe label noise. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9291–9300. [Google Scholar]
- Lee, K.; He, X.; Zhang, L.; Yang, L. CleanNet: Transfer learning for scalable image classifier training with label noise. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018 2018; pp. 5447–5456. [Google Scholar]
Notation | Description | Notation | Description |
D | Original dataset to be corrupted with noise. | j-th attribute () in dataset D. | |
n | Number of samples contained in dataset D. | max | Maximum value of the j-th attribute (). |
m | Number of attributes contained in dataset D. | min | Minimum value of the j-th attribute (). |
c | Number of class labels contained in dataset D. | mean | Mean value of the j-th attribute (). |
i-th sample () in dataset D. | median | Median value of the j-th attribute (). | |
Set of indices of samples in D. | var | Variance value of the j-th attribute (). | |
Output class corresponding to dataset D. | Original value of j-th variable in sample . | ||
Set of class labels in dataset D. | Noisy value of j-th variable in sample . | ||
k-th output class label () in . | Set of indices of variables in D. | ||
Proportion of samples with class label in D. | Noise level in used by the noise model. |
Component | Identifier | Description | Ref. |
label | Noise affects the class labels of some of the samples () in the dataset. | [22] | |
Type | attribute | Noise affects the attribute values () of some samples (). | [78] |
combined | Noise affects the labels and attributes () of some samples (). | [26] | |
symmetric | Samples in all classes or attributes have equal probability of noise. | [38] | |
asymmetric | Samples in each class or attribute have a different probability of noise. | [20] | |
unconditional | Noise unconditionally affects all samples () in the dataset to be corrupted. | [67] | |
Selection | majority-class | Random choice of samples from the majority class within the dataset to be corrupted. | [21] |
Gaussian | A Gaussian distribution determines noise probabilities using distances to decision boundaries. | [22] | |
Gamma | A Gamma distribution determines noise probabilities using distances to decision boundaries. | [70] | |
one-dimensional | Given c intervals for , samples with and are corrupted. | [74] | |
uniform | The noisy value is randomly chosen within the domain of the variable excluding the original value. | [38] | |
completely-uniform | The noisy value is randomly chosen within the domain of the variable including the original value. | [44] | |
default | Original clean values are replaced by a fixed noisy value within the domain of the variable to corrupt. | [71] | |
Disruption | Gaussian | A random value following a zero-mean Gaussian distribution is added to the original attribute value. | [17] |
natural-distribution | A random value with probability proportional to the original distribution replaces the original value. | [72] | |
unit-simplex | The probability of choosing each value as noisy is determined by a k-dimensional unit-simplex. | [46] | |
bidirectional | Given a pair of values for a variable , samples with change it to b and vice versa. | [33] |
Noise Model | Ref. | Noise Model | Ref. |
Label noise models | |||
Asymmetric default label noise | [72] | PMD-based confidence label noise | [83] |
Asymmetric sparse label noise | [84] | Quadrant-based uniform label noise | [62] |
Asymmetric uniform label noise | [20] | Score-based confidence label noise | [85] |
Attribute-mean uniform label noise | [86] | Sigmoid-bounded uniform label noise | [43] |
Clustering-based voting label noise | [81] | Small-margin borderline label noise | [87] |
Exponential borderline label noise | [69] | Smudge-based completely-uniform label noise | [88] |
Exponential/smudge completely-uniform label noise | [76] | Symmetric adjacent label noise | [89] |
Fraud bidirectional label noise | [35] | Symmetric center-based label noise | [90] |
Gamma borderline label noise | [70] | Symmetric completely-uniform label noise | [44] |
Gaussian borderline label noise | [22] | Symmetric confusion label noise | [63] |
Gaussian-level uniform label noise | [61] | Symmetric default label noise | [71] |
Gaussian-mixture borderline label noise | [22] | Symmetric diametrical label noise | [72] |
Hubness-proportional uniform label noise | [16] | Symmetric double-default label noise | [91] |
IR-stable bidirectional label noise | [33] | Symmetric double-random label noise | [80] |
Laplace borderline label noise | [73] | Symmetric exchange label noise | [66] |
Large-margin uniform label noise | [87] | Symmetric hierarchical label noise | [23] |
Majority-class unidirectional label noise | [21] | Symmetric hierarchical/next-class label noise | [79] |
Minority-driven bidirectional label noise | [92] | Symmetric natural-distribution label noise | [72] |
Minority-proportional uniform label noise | [93] | Symmetric nearest-neighbor label noise | [41] |
Misclassification prediction label noise | [81] | Symmetric next-class label noise | [75] |
Multiple-class unidirectional label noise | [81] | Symmetric non-uniform label noise | [94] |
Neighborwise borderline label noise | [15] | Symmetric optimistic label noise | [72] |
Non-linearwise borderline label noise | [15] | Symmetric pessimistic label noise | [72] |
One-dimensional uniform label noise | [74] | Symmetric uniform label noise | [38] |
Open-set ID/nearest-neighbor label noise | [41] | Symmetric unit-simplex label noise | [46] |
Open-set ID/uniform label noise | [41] | Uneven-Gaussian borderline label noise | [73] |
Pairwise bidirectional label noise | [95] | Uneven-Laplace borderline label noise | [73] |
Attribute noise models | |||
Asymmetric interval-based attribute noise | [58] | Symmetric scaled-Gaussian attribute noise | [25] |
Asymmetric uniform attribute noise | [82] | Symmetric uniform attribute noise | [77] |
Boundary/dependent Gaussian attribute noise | [68] | Symmetric/dependent Gaussian attribute noise | [67] |
Importance interval-based attribute noise | [58] | Symmetric/dependent Gaussian-image attribute noise | [96] |
Symmetric completely-uniform attribute noise | [26] | Symmetric/dependent random-pixel attribute noise | [96] |
Symmetric end-directed attribute noise | [78] | Symmetric/dependent uniform attribute noise | [82] |
Symmetric Gaussian attribute noise | [17] | Unconditional fixed-width attribute noise | [97] |
Symmetric interval-based attribute noise | [58] | Unconditional vp-Gaussian attribute noise | [67] |
Combined noise models | |||
Symmetric completely-uniform combined noise | [26] | Unconditional/symmetric Gaussian/uniform combined noise | [27] |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sáez, J.A. Noise Models in Classification: Unified Nomenclature, Extended Taxonomy and Pragmatic Categorization. Mathematics 2022, 10, 3736. https://doi.org/10.3390/math10203736
Sáez JA. Noise Models in Classification: Unified Nomenclature, Extended Taxonomy and Pragmatic Categorization. Mathematics. 2022; 10(20):3736. https://doi.org/10.3390/math10203736
Chicago/Turabian StyleSáez, José A. 2022. "Noise Models in Classification: Unified Nomenclature, Extended Taxonomy and Pragmatic Categorization" Mathematics 10, no. 20: 3736. https://doi.org/10.3390/math10203736
APA StyleSáez, J. A. (2022). Noise Models in Classification: Unified Nomenclature, Extended Taxonomy and Pragmatic Categorization. Mathematics, 10(20), 3736. https://doi.org/10.3390/math10203736