Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (7)

Search Parameters:
Keywords = mislabeled data filter

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 3365 KB  
Article
Robust Federated Learning Against Data Poisoning Attacks: Prevention and Detection of Attacked Nodes
by Pretom Roy Ovi and Aryya Gangopadhyay
Electronics 2025, 14(15), 2970; https://doi.org/10.3390/electronics14152970 - 25 Jul 2025
Cited by 1 | Viewed by 2232
Abstract
Federated learning (FL) enables collaborative model building among a large number of participants without sharing sensitive data to the central server. Because of its distributed nature, FL has limited control over local data and the corresponding training process. Therefore, it is susceptible to [...] Read more.
Federated learning (FL) enables collaborative model building among a large number of participants without sharing sensitive data to the central server. Because of its distributed nature, FL has limited control over local data and the corresponding training process. Therefore, it is susceptible to data poisoning attacks where malicious workers use malicious training data to train the model. Furthermore, attackers on the worker side can easily manipulate local data by swapping the labels of training instances, adding noise to training instances, and adding out-of-distribution training instances in the local data to initiate data poisoning attacks. And local workers under such attacks carry incorrect information to the server, poison the global model, and cause misclassifications. So, the prevention and detection of such data poisoning attacks is crucial to build a robust federated training framework. To address this, we propose a prevention strategy in federated learning, namely confident federated learning, to protect workers from such data poisoning attacks. Our proposed prevention strategy at first validates the label quality of local training samples by characterizing and identifying label errors in the local training data, and then excludes the detected mislabeled samples from the local training. To this aim, we experiment with our proposed approach on both the image and audio domains, and our experimental results validated the robustness of our proposed confident federated learning in preventing the data poisoning attacks. Our proposed method can successfully detect the mislabeled training samples with above 85% accuracy and exclude those detected samples from the training set to prevent data poisoning attacks on the local workers. However, our prevention strategy can successfully prevent the attack locally in the presence of a certain percentage of poisonous samples. Beyond that percentage, the prevention strategy may not be effective in preventing attacks. In such cases, detection of the attacked workers is needed. So, in addition to the prevention strategy, we propose a novel detection strategy in the federated learning framework to detect the malicious workers under attack. We propose to create a class-wise cluster representation for every participating worker by utilizing the neuron activation maps of local models and analyze the resulting clusters to filter out the workers under attack before model aggregation. We experimentally demonstrated the efficacy of our proposed detection strategy in detecting workers affected by data poisoning attacks, along with the attack types, e.g., label-flipping or dirty labeling. In addition, our experimental results suggest that the global model could not converge even after a large number of training rounds in the presence of malicious workers, whereas after detecting the malicious workers with our proposed detection method and discarding them from model aggregation, we ensured that the global model achieved convergence within very few training rounds. Furthermore, our proposed approach stays robust under different data distributions and model sizes and does not require prior knowledge about the number of attackers in the system. Full article
Show Figures

Figure 1

23 pages, 7950 KB  
Article
Tripartite: Tackling Realistic Noisy Labels with More Precise Partitions
by Lida Yu, Xuefeng Liang, Chang Cao, Longshan Yao and Xingyu Liu
Sensors 2025, 25(11), 3369; https://doi.org/10.3390/s25113369 - 27 May 2025
Viewed by 1074
Abstract
Samples in large-scale datasets may be mislabeled for various reasons, and deep models are inclined to over-fit some noisy samples using conventional training procedures. The key solution is to alleviate the harm of these noisy labels. Many existing methods try to divide training [...] Read more.
Samples in large-scale datasets may be mislabeled for various reasons, and deep models are inclined to over-fit some noisy samples using conventional training procedures. The key solution is to alleviate the harm of these noisy labels. Many existing methods try to divide training data into clean and noisy subsets in terms of loss values. We observe that a reason hindering the better performance of deep models is the uncertain samples, which have relatively small losses and often appear in real-world datasets. Due to small losses, many uncertain noisy samples are divided into the clean subset and then degrade models’ performance. Instead, we propose a Tripartite solution to partition training data into three subsets, uncertain, clean and noisy according to the following criteria: the inconsistency of the predictions of two networks and the given labels. Tripartite considerably improves the quality of the clean subset. Moreover, to maximize the value of clean samples in the uncertain subset and minimize the harm of noisy labels, we apply low-weight learning and a semi-supervised learning, respectively. Extensive experiments demonstrate that Tripartite can filter out noisy samples more precisely and outperforms most state-of-the-art methods on four benchmark datasets and especially real-world datasets. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

22 pages, 4021 KB  
Article
An Auto-Encoder with Genetic Algorithm for High Dimensional Data: Towards Accurate and Interpretable Outlier Detection
by Jiamu Li, Ji Zhang, Mohamed Jaward Bah, Jian Wang, Youwen Zhu, Gaoming Yang, Lingling Li and Kexin Zhang
Algorithms 2022, 15(11), 429; https://doi.org/10.3390/a15110429 - 15 Nov 2022
Cited by 7 | Viewed by 4860
Abstract
When dealing with high-dimensional data, such as in biometric, e-commerce, or industrial applications, it is extremely hard to capture the abnormalities in full space due to the curse of dimensionality. Furthermore, it is becoming increasingly complicated but essential to provide interpretations for outlier [...] Read more.
When dealing with high-dimensional data, such as in biometric, e-commerce, or industrial applications, it is extremely hard to capture the abnormalities in full space due to the curse of dimensionality. Furthermore, it is becoming increasingly complicated but essential to provide interpretations for outlier detection results in high-dimensional space as a consequence of the large number of features. To alleviate these issues, we propose a new model based on a Variational AutoEncoder and Genetic Algorithm (VAEGA) for detecting outliers in subspaces of high-dimensional data. The proposed model employs a neural network to create a probabilistic dimensionality reduction variational autoencoder (VAE) that applies its low-dimensional hidden space to characterize the high-dimensional inputs. Then, the hidden vector is sampled randomly from the hidden space to reconstruct the data so that it closely matches the input data. The reconstruction error is then computed to determine an outlier score, and samples exceeding the threshold are tentatively identified as outliers. In the second step, a genetic algorithm (GA) is used as a basis for examining and analyzing the abnormal subspace of the outlier set obtained by the VAE layer. After encoding the outlier dataset’s subspaces, the degree of anomaly for the detected subspaces is calculated using the redefined fitness function. Finally, the abnormal subspace is calculated for the detected point by selecting the subspace with the highest degree of anomaly. The clustering of abnormal subspaces helps filter outliers that are mislabeled (false positives), and the VAE layer adjusts the network weights based on the false positives. When compared to other methods using five public datasets, the VAEGA outlier detection model results are highly interpretable and outperform or have competitive performance compared to current contemporary methods. Full article
Show Figures

Figure 1

29 pages, 3292 KB  
Article
Detection and Classification of Artifact Distortions in Optical Motion Capture Sequences
by Przemysław Skurowski and Magdalena Pawlyta
Sensors 2022, 22(11), 4076; https://doi.org/10.3390/s22114076 - 27 May 2022
Cited by 5 | Viewed by 3264
Abstract
Optical motion capture systems are prone to errors connected to marker recognition (e.g., occlusion, leaving the scene, or mislabeling). These errors are then corrected in the software, but the process is not perfect, resulting in artifact distortions. In this article, we examine four [...] Read more.
Optical motion capture systems are prone to errors connected to marker recognition (e.g., occlusion, leaving the scene, or mislabeling). These errors are then corrected in the software, but the process is not perfect, resulting in artifact distortions. In this article, we examine four existing types of artifacts and propose a method for detection and classification of the distortions. The algorithm is based on the derivative analysis, low-pass filtering, mathematical morphology, and loose predictor. The tests involved multiple simulations using synthetically-distorted sequences, performance comparisons to human operators (concerning real life data), and an applicability analysis for the distortion removal. Full article
(This article belongs to the Special Issue Intelligent Sensors for Human Motion Analysis)
Show Figures

Figure 1

10 pages, 9962 KB  
Article
Leveraging the Generalization Ability of Deep Convolutional Neural Networks for Improving Classifiers for Color Fundus Photographs
by Jaemin Son, Jaeyoung Kim, Seo Taek Kong and Kyu-Hwan Jung
Appl. Sci. 2021, 11(2), 591; https://doi.org/10.3390/app11020591 - 9 Jan 2021
Cited by 10 | Viewed by 3794
Abstract
Deep learning demands a large amount of annotated data, and the annotation task is often crowdsourced for economic efficiency. When the annotation task is delegated to non-experts, the dataset may contain data with inaccurate labels. Noisy labels not only yield classification models with [...] Read more.
Deep learning demands a large amount of annotated data, and the annotation task is often crowdsourced for economic efficiency. When the annotation task is delegated to non-experts, the dataset may contain data with inaccurate labels. Noisy labels not only yield classification models with sub-optimal performance, but may also impede their optimization dynamics. In this work, we propose exploiting the pattern recognition capacity of deep convolutional neural networks to filter out supposedly mislabeled cases while training. We suggest a training method that references softmax outputs to judge the correctness of the given labels. This approach achieved outstanding performance compared to the existing methods in various noise settings on a large-scale dataset (Kaggle 2015 Diabetic Retinopathy). Furthermore, we demonstrate a method mining positive cases from a pool of unlabeled images by exploiting the generalization ability. With this method, we won first place on the offsite validation dataset in pathological myopia classification challenge (PALM), achieving the AUROC of 0.9993 in the final submission. Source codes are publicly available. Full article
(This article belongs to the Special Issue Biomedical Engineering Applications in Vision Science)
Show Figures

Figure 1

26 pages, 847 KB  
Article
Enhanced Label Noise Filtering with Multiple Voting
by Donghai Guan, Maqbool Hussain, Weiwei Yuan, Asad Masood Khattak, Muhammad Fahim and Wajahat Ali Khan
Appl. Sci. 2019, 9(23), 5031; https://doi.org/10.3390/app9235031 - 21 Nov 2019
Viewed by 2270
Abstract
Label noises exist in many applications, and their presence can degrade learning performance. Researchers usually use filters to identify and eliminate them prior to training. The ensemble learning based filter (EnFilter) is the most widely used filter. According to the voting mechanism, EnFilter [...] Read more.
Label noises exist in many applications, and their presence can degrade learning performance. Researchers usually use filters to identify and eliminate them prior to training. The ensemble learning based filter (EnFilter) is the most widely used filter. According to the voting mechanism, EnFilter is mainly divided into two types: single-voting based (SVFilter) and multiple-voting based (MVFilter). In general, MVFilter is more often preferred because multiple-voting could address the intrinsic limitations of single-voting. However, the most important unsolved issue in MVFilter is how to determine the optimal decision point (ODP). Conceptually, the decision point is a threshold value, which determines the noise detection performance. To maximize the performance of MVFilter, we propose a novel approach to compute the optimal decision point. Our approach is data driven and cost sensitive, which determines the ODP based on the given noisy training dataset and noise misrecognition cost matrix. The core idea of our approach is to estimate the mislabeled data probability distributions, based on which the expected cost of each possible decision point could be inferred. Experimental results on a set of benchmark datasets illustrate the utility of our proposed approach. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
11 pages, 3181 KB  
Article
sEst: Accurate Sex-Estimation and Abnormality Detection in Methylation Microarray Data
by Chol-Hee Jung, Daniel J. Park, Peter Georgeson, Khalid Mahmood, Roger L. Milne, Melissa C. Southey and Bernard J. Pope
Int. J. Mol. Sci. 2018, 19(10), 3172; https://doi.org/10.3390/ijms19103172 - 15 Oct 2018
Cited by 7 | Viewed by 4773
Abstract
DNA methylation influences predisposition, development and prognosis for many diseases, including cancer. However, it is not uncommon to encounter samples with incorrect sex labelling or atypical sex chromosome arrangement. Sex is one of the strongest influencers of the genomic distribution of DNA methylation [...] Read more.
DNA methylation influences predisposition, development and prognosis for many diseases, including cancer. However, it is not uncommon to encounter samples with incorrect sex labelling or atypical sex chromosome arrangement. Sex is one of the strongest influencers of the genomic distribution of DNA methylation and, therefore, correct assignment of sex and filtering of abnormal samples are essential for the quality control of study data. Differences in sex chromosome copy numbers between sexes and X-chromosome inactivation in females result in distinctive sex-specific patterns in the distribution of DNA methylation levels. In this study, we present a software tool, sEst, which incorporates clustering analysis to infer sex and to detect sex-chromosome abnormalities from DNA methylation microarray data. Testing with two publicly available datasets demonstrated that sEst not only correctly inferred the sex of the test samples, but also identified mislabelled samples and samples with potential sex-chromosome abnormalities, such as Klinefelter syndrome and Turner syndrome, the latter being a feature not offered by existing methods. Considering that sex and the sex-chromosome abnormalities can have large effects on many phenotypes, including diseases, our method can make a significant contribution to DNA methylation studies that are based on microarray platforms. Full article
(This article belongs to the Section Molecular Genetics and Genomics)
Show Figures

Figure 1

Back to TopTop