Skip to Content

3,836 Results Found

  • Article
  • Open Access
123 Citations
10,888 Views
15 Pages

Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem

  • Eréndira Rendón,
  • Roberto Alejo,
  • Carlos Castorena,
  • Frank J. Isidro-Ortega and
  • Everardo E. Granda-Gutiérrez

14 February 2020

The class imbalance problem has been a hot topic in the machine learning community in recent years. Nowadays, in the time of big data and deep learning, this problem remains in force. Much work has been performed to deal to the class imbalance proble...

  • Article
  • Open Access
23 Citations
8,971 Views
31 Pages

A Survey of Methods for Addressing Imbalance Data Problems in Agriculture Applications

  • Tajul Miftahushudur,
  • Halil Mertkan Sahin,
  • Bruce Grieve and
  • Hujun Yin

29 January 2025

This survey explores recent advances in addressing class imbalance issues for developing machine learning models in precision agriculture, with a focus on techniques used for plant disease detection, soil management, and crop classification. We exami...

  • Review
  • Open Access
82 Citations
17,903 Views
16 Pages

12 June 2022

Recent technological developments have led to an increase in the size and types of data in the medical field derived from multiple platforms such as proteomic, genomic, imaging, and clinical data. Many machine learning models have been developed to s...

  • Article
  • Open Access
9 Citations
3,662 Views
21 Pages

7 June 2022

Sensing the nighttime economy–housing imbalance is of great importance for urban planning and commerce. As an efficient tool of social sensing and human observation, mobile phone data provides an effective way to address this issue. In this pap...

  • Article
  • Open Access
2 Citations
3,165 Views
12 Pages

9 August 2022

Bone age assessment (BAA) is an important indicator of child maturity. Generally, a person is evaluated for bone age mostly during puberty stage; compared to toddlers and post-puberty stages, the data of bone age at puberty stage are much easier to o...

  • Article
  • Open Access
4 Citations
2,464 Views
15 Pages

Data-Centric Solutions for Addressing Big Data Veracity with Class Imbalance, High Dimensionality, and Class Overlapping

  • Armando Bolívar,
  • Vicente García,
  • Roberto Alejo,
  • Rogelio Florencia-Juárez and
  • J. Salvador Sánchez

4 July 2024

An innovative strategy for organizations to obtain value from their large datasets, allowing them to guide future strategic actions and improve their initiatives, is the use of machine learning algorithms. This has led to a growing and rapid applicat...

  • Article
  • Open Access
19 Citations
5,490 Views
16 Pages

19 January 2023

Due to the distributed data collection and learning in federated learnings, many clients conduct local training with non-independent and identically distributed (non-IID) datasets. Accordingly, the training from these datasets results in severe perfo...

  • Feature Paper
  • Article
  • Open Access
2 Citations
1,623 Views
12 Pages

4 December 2022

We deal with the problem of class imbalance in data mining and machine learning classification algorithms. This is the case where some of the class labels are represented by a small number of examples in the training dataset compared to the rest of t...

  • Article
  • Open Access
4 Citations
2,097 Views
19 Pages

10 December 2024

With the escalating threat posed by network intrusions, the development of efficient intrusion detection systems (IDSs) has become imperative. This study focuses on improving detection performance in programmable logic controller (PLC) network securi...

  • Article
  • Open Access
4 Citations
3,609 Views
22 Pages

20 May 2023

Considering the sensitivity of data in medical scenarios, federated learning (FL) is suitable for applications that require data privacy. Medical personnel can use the FL framework for machine learning to assist in analyzing large-scale data that are...

  • Review
  • Open Access
34 Citations
18,201 Views
20 Pages

3 February 2025

In the rapid development of the Internet of Things (IoT) and large-scale distributed networks, Intrusion Detection Systems (IDS) face significant challenges in handling complex spatiotemporal features and addressing data imbalance issues. This articl...

  • Article
  • Open Access
2 Citations
1,382 Views
18 Pages

22 August 2024

The safety and reliability of high-speed train electric traction systems are crucial. However, the operating environment for China Railway High-speed (CRH) trains is challenging, with severe working conditions. Dataset imbalance further complicates f...

  • Article
  • Open Access
804 Views
23 Pages

24 November 2025

This study addresses the challenges of poorly annotated data and class imbalance in mental health detection from social media. We propose an integrated approach combining weak classifiers with gradient boosting, leveraging LSTM, pretrained Transforme...

  • Article
  • Open Access
20 Citations
3,457 Views
18 Pages

Combination of Feature Selection and Resampling Methods to Predict Preterm Birth Based on Electrohysterographic Signals from Imbalance Data

  • Félix Nieto-del-Amor,
  • Gema Prats-Boluda,
  • Javier Garcia-Casado,
  • Alba Diaz-Martinez,
  • Vicente Jose Diago-Almela,
  • Rogelio Monfort-Ortiz,
  • Dongmei Hao and
  • Yiyao Ye-Lin

7 July 2022

Due to its high sensitivity, electrohysterography (EHG) has emerged as an alternative technique for predicting preterm labor. The main obstacle in designing preterm labor prediction models is the inherent preterm/term imbalance ratio, which can give...

  • Article
  • Open Access
10 Citations
1,878 Views
27 Pages

5 June 2025

Battery health monitoring and remaining useful life (RUL) estimation for electric vehicles face two significant challenges: battery data heterogeneity and sample imbalance. This study presents a novel approach based on Transformer architecture to spe...

  • Article
  • Open Access
2,299 Views
22 Pages

An Empirical Analysis of Crash Injury Severity Among Young Drivers in England: Accounting for Data Imbalance

  • Amirhossein Taheri,
  • Kevin Switala,
  • Grigorios Fountas,
  • Abbas Sheykhfard,
  • Nima Dadashzadeh and
  • Steffen Müller

25 April 2025

Crash data analysis is key to improving road safety, but imbalanced data challenges accurate predictions for severe crashes, often leading to biased outcomes. This study investigates crash severity among young drivers (aged 17–24) in England, u...

  • Article
  • Open Access
5 Citations
2,783 Views
21 Pages

An open problem impeding the use of deep learning (DL) models for forecasting land cover (LC) changes is their bias toward persistent cells. By providing sample weights for model training, LC changes can be allocated greater influence in adjustments...

  • Article
  • Open Access
1 Citations
2,778 Views
24 Pages

16 December 2021

Models that can predict battery cells’ thermal and electrical behaviors are necessary for real-time battery management systems to regulate the imbalance within battery cells. This work introduces a Gaussian Process Regression (GPR)-based data-driven...

  • Article
  • Open Access
25 Citations
3,797 Views
14 Pages

4 March 2019

Hit-and-run (HR) crashes refer to crashes involving drivers of the offending vehicle fleeing incident scenes without aiding the possible victims or informing authorities for emergency medical services. This paper aims at identifying significant predi...

  • Article
  • Open Access
10 Citations
3,106 Views
26 Pages

10 February 2023

Safe and stable operation of the aircraft hydraulic system is of great significance to the flight safety of an aircraft. Any fault may be a threat to flight safety and may lead to enormous economic losses and even human casualties. Hence, the normal...

  • Article
  • Open Access
50 Citations
6,684 Views
16 Pages

A Hybrid Supervised Machine Learning Classifier System for Breast Cancer Prognosis Using Feature Selection and Data Imbalance Handling Approaches

  • Yogendra Singh Solanki,
  • Prasun Chakrabarti,
  • Michal Jasinski,
  • Zbigniew Leonowicz,
  • Vadim Bolshev,
  • Alexander Vinogradov,
  • Elzbieta Jasinska,
  • Radomir Gono and
  • Mohammad Nami

Nowadays, breast cancer is the most frequent cancer among women. Early detection is a critical issue that can be effectively achieved by machine learning (ML) techniques. Thus in this article, the methods to improve the accuracy of ML classification...

  • Article
  • Open Access
2 Citations
1,933 Views
15 Pages

Anomaly Detection Using Puzzle-Based Data Augmentation to Overcome Data Imbalances and Deficiencies

  • Eunkyeong Kim,
  • Seunghwan Jung,
  • Minseok Kim,
  • Jinyong Kim,
  • Baekcheon Kim,
  • Jonggeun Kim and
  • Sungshin Kim

20 November 2023

Machine tools are used in a wide range of applications, and they can manufacture workpieces flexibly. Furthermore, they require maintenance; the overall costs include maintenance costs, which constitute a significant portion, and the costs involved i...

  • Article
  • Open Access
9 Citations
3,970 Views
19 Pages

The approach of federated learning (FL) addresses significant challenges, including access rights, privacy, security, and the availability of diverse data. However, edge devices produce and collect data in a non-independent and identically distribute...

  • Article
  • Open Access
1 Citations
3,242 Views
15 Pages

Disengagement of students during online learning significantly impacts the effectiveness of online education. Thus, accurately estimating when students are not engaged is a critical aspect of online-learning research. However, the inherent characteri...

  • Article
  • Open Access
2 Citations
3,696 Views
14 Pages

11 December 2023

Imbalanced data present a pervasive challenge in many real-world applications of statistical and machine learning, where the instances of one class significantly outnumber those of the other. This paper examines the impact of class imbalance on the p...

  • Article
  • Open Access
11 Citations
2,939 Views
15 Pages

12 December 2022

Synthetic aperture radar (SAR) ship recognition can obtain location and class information from SAR scene images, which is important in military and civilian fields, and has turned into a very important research focus recently. Limited by data conditi...

  • Article
  • Open Access
7 Citations
4,075 Views
13 Pages

Alleviating Class-Imbalance Data of Semiconductor Equipment Anomaly Detection Study

  • Da Hoon Seol,
  • Jeong Eun Choi,
  • Chan Young Kim and
  • Sang Jeen Hong

Plasma-based semiconductor processing is highly sensitive, thus even minor changes in the procedure can have serious consequences. The monitoring and classification of these equipment anomalies can be performed using fault detection and classificatio...

  • Article
  • Open Access
4 Citations
3,552 Views
21 Pages

Data imbalance is a serious problem in machine learning that can be alleviated at the data level by balancing the class distribution with sampling. In the last decade, several sampling methods have been published to address the shortcomings of the in...

  • Review
  • Open Access
8 Citations
3,168 Views
23 Pages

Wind Turbine SCADA Data Imbalance: A Review of Its Impact on Health Condition Analyses and Mitigation Strategies

  • Adaiton Oliveira-Filho,
  • Monelle Comeau,
  • James Cave,
  • Charbel Nasr,
  • Pavel Côté and
  • Antoine Tahan

27 December 2024

The rapidly increasing installed capacity of Wind Turbines (WTs) worldwide emphasizes the need for Operation and Maintenance (O&M) strategies favoring high availability, reliability, and cost-effective operation. Optimal decision-making and plann...

  • Article
  • Open Access
19 Citations
3,664 Views
18 Pages

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classifi...

  • Article
  • Open Access
2 Citations
2,957 Views
19 Pages

Data Are Power: Addressing the Power Imbalance Around Community Data with the Open-Access Data4HumanRights Curriculum

  • Monika Kuffer,
  • Dana R. Thomson,
  • Dianne Wakonyo,
  • Nicera Wanjiru Kimani,
  • Divyani Kohli-Poll Jonker,
  • Enyo Okoko,
  • Rasak Toheeb,
  • Bisola Akinmuyiwa,
  • Mohammed Zanna and
  • Andrew Maki
  • + 1 author

3 February 2025

Data4HumanRights’ training materials have been developed as open-source and tailored to limited-resource settings, where community data collectors often live and work. Access to training on data collection, analysis, and visualisation to suppor...

  • Article
  • Open Access
1 Citations
3,219 Views
18 Pages

Paper recommendation systems are important for alleviating academic information overload. Such systems provide personalized recommendations based on implicit feedback from users, supplemented by their subject information, citation networks, etc. Howe...

  • Article
  • Open Access
2 Citations
4,983 Views
13 Pages

The analysis in this study covers how power imbalance, alliance cohesion, diplomatic and media framing, and big data analytics affect scaling up in the conflict in a multipolar world. This research applies the Constructivist International Relations T...

  • Article
  • Open Access
3 Citations
2,594 Views
13 Pages

11 March 2024

Terrorism poses a significant threat to international peace and stability. The ability to predict potential casualties resulting from terrorist attacks, based on specific attack characteristics, is vital for protecting the safety of innocent civilian...

  • Article
  • Open Access
2 Citations
2,910 Views
17 Pages

Continuously acquired biosignals from patient monitors contain significant amounts of unusable data. During the development of a decision support system based on continuously acquired biosignals, we developed machine and deep learning algorithms to a...

  • Article
  • Open Access
29 Citations
8,129 Views
15 Pages

The recent introduction of smart manufacturing, also called the ‘smart factory’, has made it possible to collect a significant number of multi-variate data from Internet of Things devices or sensors. Quality control using these data in th...

  • Article
  • Open Access
2,208 Views
13 Pages

11 September 2023

Current age estimation datasets often have a skewed long-tail distribution with significant data imbalance, rather than an ideal uniform distribution for each category. The existing age estimation algorithms that rely on label distribution do not lev...

  • Article
  • Open Access
19 Citations
3,467 Views
32 Pages

29 August 2024

Electrocardiography (ECG) plays a pivotal role in monitoring cardiac health, yet the manual analysis of ECG signals is challenging due to the complex task of identifying and categorizing various waveforms and morphologies within the data. Additionall...

  • Article
  • Open Access
1 Citations
2,559 Views
12 Pages

7 May 2022

Lifelogs are generated in our daily lives and contain useful information for health monitoring. Nowadays, one can easily obtain various lifelogs from a wearable device such as a smartwatch. These lifelogs could include noise and outliers. In general,...

  • Article
  • Open Access
10 Citations
3,072 Views
22 Pages

Phase Imbalance Analysis of GF-3 Along-Track InSAR Data for Ocean Current Measurement

  • Junxin Yang,
  • Xinzhe Yuan,
  • Bing Han,
  • Liangbo Zhao,
  • Jili Sun,
  • Mingyang Shang,
  • Xiaochen Wang and
  • Chibiao Ding

14 January 2021

There are two useful methods of current measurement based on synthetic aperture radar (SAR): one is along-track interferometry (ATI), and the other is Doppler centroid analysis (DCA). For the ATI method, the interferometric phase must be accurate eno...

  • Article
  • Open Access
460 Views
28 Pages

Industrial control systems (ICSs) are increasingly interconnected with enterprise IT networks and remote services, which expands the attack surface of operational technology (OT) environments. However, collecting sufficient attack traffic from real O...

  • Article
  • Open Access
47 Citations
8,691 Views
14 Pages

Power quality studies for distribution networks are very important for future network expansions realized by utility companies, so the accuracy of such studies is critical. Load data, including information on load imbalance, could have in many situat...

  • Article
  • Open Access
6 Citations
2,990 Views
21 Pages

Applying Machine Learning Sampling Techniques to Address Data Imbalance in a Chilean COVID-19 Symptoms and Comorbidities Dataset

  • Pablo Ormeño-Arriagada,
  • Gastón Márquez,
  • David Araya,
  • Carla Rimassa and
  • Carla Taramasco

23 January 2025

Reliably detecting COVID-19 is critical for diagnosis and disease control. However, imbalanced data in medical datasets pose significant challenges for machine learning models, leading to bias and poor generalization. The dataset obtained from the EP...

  • Article
  • Open Access
18 Citations
6,794 Views
14 Pages

21 August 2023

Classification problems due to data imbalance occur in many fields and have long been studied in the machine learning field. Many real-world datasets suffer from the issue of class imbalance, which occurs when the sizes of classes are not uniform; th...

  • Article
  • Open Access
59 Citations
8,112 Views
16 Pages

Generative Oversampling Method for Imbalanced Data on Bearing Fault Detection and Diagnosis

  • Sungho Suh,
  • Haebom Lee,
  • Jun Jo,
  • Paul Lukowicz and
  • Yong Oh Lee

20 February 2019

In this study, we developed a novel data-driven fault detection and diagnosis (FDD) method for bearing faults in induction motors where the fault condition data are imbalanced. First, we propose a bearing fault detector based on convolutional neural...

  • Article
  • Open Access
8 Citations
3,066 Views
16 Pages

Counteracting Data Bias and Class Imbalance—Towards a Useful and Reliable Retinal Disease Recognition System

  • Adam R. Chłopowiec,
  • Konrad Karanowski,
  • Tomasz Skrzypczak,
  • Mateusz Grzesiuk,
  • Adrian B. Chłopowiec and
  • Martin Tabakov

Multiple studies presented satisfactory performances for the treatment of various ocular diseases. To date, there has been no study that describes a multiclass model, medically accurate, and trained on large diverse dataset. No study has addressed a...

  • Article
  • Open Access
12 Citations
3,275 Views
14 Pages

16 December 2021

More and more Android application developers are adopting many different methods against reverse engineering, such as adding a shell, resulting in certain features that cannot be obtained through decompilation, which causes a serious sample imbalance...

  • Article
  • Open Access
7 Citations
4,414 Views
13 Pages

20 October 2022

Semi-supervised learning (SSL) is a popular research area in machine learning which utilizes both labeled and unlabeled data. As an important method for the generation of artificial hard labels for unlabeled data, the pseudo-labeling method is introd...

  • Article
  • Open Access
4 Citations
3,202 Views
9 Pages

Data Balancing Based on Pre-Training Strategy for Liver Segmentation from CT Scans

  • Yong Zhang,
  • Yi Wang,
  • Yizhu Wang,
  • Bin Fang,
  • Wei Yu,
  • Hongyu Long and
  • Hancheng Lei

2 May 2019

Data imbalance is often encountered in deep learning process and is harmful to model training. The imbalance of hard and easy samples in training datasets often occurs in the segmentation tasks from Contrast Tomography (CT) scans. However, due to the...

  • Article
  • Open Access
3 Citations
2,497 Views
32 Pages

An Adaptive Active Learning Method for Multiclass Imbalanced Data Streams with Concept Drift

  • Meng Han,
  • Chunpeng Li,
  • Fanxing Meng,
  • Feifei He and
  • Ruihua Zhang

15 August 2024

Learning from multiclass imbalanced data streams with concept drift and variable class imbalance ratios under a limited label budget presents new challenges in the field of data mining. To address these challenges, this paper proposes an adaptive act...

of 77