Skip to Content

187 Results Found

  • Review
  • Open Access
6 Citations
2,744 Views
20 Pages

Handling the Imbalanced Problem in Agri-Food Data Analysis

  • Adeyemi O. Adegbenjo and
  • Michael O. Ngadi

17 October 2024

Imbalanced data situations exist in most fields of endeavor. The problem has been identified as a major bottleneck in machine learning/data mining and is becoming a serious issue of concern in food processing applications. Inappropriate analysis of a...

  • Article
  • Open Access
31 Citations
4,606 Views
16 Pages

9 April 2023

The Air Quality Index (AQI) dataset contains information on measurements of pollutants and ambient air quality conditions at certain location that can be used to predict air quality. Unfortunately, this dataset often has many missing observations and...

  • Article
  • Open Access
5 Citations
3,533 Views
20 Pages

Analyzing the Effectiveness of Imbalanced Data Handling Techniques in Predicting Driver Phone Use

  • Madhar M. Taamneh,
  • Salah Taamneh,
  • Ahmad H. Alomari and
  • Musab Abuaddous

6 July 2023

Distracted driving leads to a significant number of road crashes worldwide. Smartphone use is one of the most common causes of cognitive distraction among drivers. Available data on drivers’ phone use presents an invaluable opportunity to ident...

  • Entry
  • Open Access
48 Citations
9,161 Views
16 Pages

27 November 2024

The increasing complexity of social science data and phenomena necessitates using advanced analytical techniques to capture nonlinear relationships that traditional linear models often overlook. This chapter explores the application of machine learni...

  • Article
  • Open Access
137 Citations
12,587 Views
24 Pages

Crash severity is undoubtedly a fundamental aspect of a crash event. Although machine learning algorithms for predicting crash severity have recently gained interest by the academic community, there is a significant trend towards neglecting the fact...

  • Article
  • Open Access
19 Citations
4,389 Views
19 Pages

23 April 2020

The developments in the fields of industrial Internet of Things (IIoT) and big data technologies have made it possible to collect a lot of meaningful industrial process and quality-based data. The gathered data are analyzed using contemporary statist...

  • Article
  • Open Access
4 Citations
2,682 Views
23 Pages

14 February 2022

Class imbalance is a phenomenon of asymmetry that degrades the performance of traditional classification algorithms such as the Support Vector Machine (SVM) and Extreme Learning Machine (ELM). Various modifications of SVM and ELM have been proposed t...

  • Article
  • Open Access
1 Citations
1,827 Views
33 Pages

30 May 2025

Women exhibit marked physiological transformations in pregnancy, mandating regular and holistic assessment. Maternal and fetal vitality is governed by a spectrum of clinical, demographic, and lifestyle factors throughout this critical period. The exi...

  • Article
  • Open Access
6 Citations
2,371 Views
32 Pages

11 March 2025

The accurate prediction of brain stroke is critical for effective diagnosis and management, yet the imbalanced nature of medical datasets often hampers the performance of conventional machine learning models. To address this challenge, we propose a n...

  • Article
  • Open Access
15 Citations
4,248 Views
29 Pages

13 July 2023

Object classification in hyperspectral images involves accurately categorizing objects based on their spectral characteristics. However, the high dimensionality of hyperspectral data and class imbalance pose significant challenges to object classific...

  • Article
  • Open Access
37 Citations
5,061 Views
24 Pages

30 July 2020

Human activity recognition has become essential to a wide range of applications, such as smart home monitoring, health-care, surveillance. However, it is challenging to deliver a sufficiently robust human activity recognition system from raw sensor d...

  • Article
  • Open Access
1,126 Views
20 Pages

3 July 2025

Test-time adaptation (TTA) enhances model performance in target domains by dynamically adjusting parameters using unlabeled test data. However, existing TTA methods typically assume balanced data distributions, whereas real-world test data is often i...

  • Article
  • Open Access
140 Citations
10,500 Views
28 Pages

Class Imbalance Ensemble Learning Based on the Margin Theory

  • Wei Feng,
  • Wenjiang Huang and
  • Jinchang Ren

18 May 2018

The proportion of instances belonging to each class in a data-set plays an important role in machine learning. However, the real world data often suffer from class imbalance. Dealing with multi-class tasks with different misclassification costs of cl...

  • Article
  • Open Access
3 Citations
2,862 Views
28 Pages

Identify High-Impact Bug Reports by Combining the Data Reduction and Imbalanced Learning Strategies

  • Shikai Guo,
  • Miaomiao Wei,
  • Siwen Wang,
  • Rong Chen,
  • Chen Guo,
  • Hui Li and
  • Tingting Li

4 September 2019

As software systems become increasingly large, the logic becomes more complex, resulting in a large number of bug reports being submitted to the bug repository daily. Due to tight schedules and limited human resources, developers may not have enough...

  • Article
  • Open Access
4 Citations
1,970 Views
16 Pages

Cost-Sensitive Models to Predict Risk of Cardiovascular Events in Patients with Chronic Heart Failure

  • Maria Carmela Groccia,
  • Rosita Guido,
  • Domenico Conforti,
  • Corrado Pelaia,
  • Giuseppe Armentaro,
  • Alfredo Francesco Toscani,
  • Sofia Miceli,
  • Elena Succurro,
  • Marta Letizia Hribal and
  • Angela Sciacqua

3 October 2023

Chronic heart failure (CHF) is a clinical syndrome characterised by symptoms and signs due to structural and/or functional abnormalities of the heart. CHF confers risk for cardiovascular deterioration events which cause recurrent hospitalisations and...

  • Article
  • Open Access
4 Citations
2,530 Views
32 Pages

An Adaptive Active Learning Method for Multiclass Imbalanced Data Streams with Concept Drift

  • Meng Han,
  • Chunpeng Li,
  • Fanxing Meng,
  • Feifei He and
  • Ruihua Zhang

15 August 2024

Learning from multiclass imbalanced data streams with concept drift and variable class imbalance ratios under a limited label budget presents new challenges in the field of data mining. To address these challenges, this paper proposes an adaptive act...

  • Article
  • Open Access
2 Citations
2,466 Views
16 Pages

31 March 2023

Class imbalance is a prevalent problem that not only reduces the performance of the machine learning techniques but also causes the lacking of the inherent complex characteristics of data. Though the researchers have proposed various ways to deal wit...

  • Article
  • Open Access
139 Citations
19,229 Views
34 Pages

Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks

  • Javad Hassannataj Joloudari,
  • Abdolreza Marefat,
  • Mohammad Ali Nematollahi,
  • Solomon Sunday Oyelere and
  • Sadiq Hussain

21 March 2023

Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models from achieving satisfactory results. ID is the occurrence of a situation where the quantity of the samples belonging to one class outnumbers that of the other by a wide margin...

  • Article
  • Open Access
1 Citations
1,859 Views
21 Pages

13 October 2023

Imbalanced data are ubiquitous in many real-world applications, and they have drawn a significant amount of attention in the field of data mining. A variety of methods have been proposed for imbalanced data classification, and data sampling methods a...

  • Article
  • Open Access
17 Citations
4,094 Views
23 Pages

Along with the rapid demographic change, there has been increased attention to the risk of vehicle crashes relative to older drivers. Due to senior involvement and their physical vulnerability, it is crucial to develop models that accurately predict...

  • Article
  • Open Access
2 Citations
1,901 Views
12 Pages

7 February 2024

Imbalanced class data are commonly observed in pattern analysis, machine learning, and various real-world applications. Conventional approaches often resort to resampling techniques in order to address the imbalance, which inevitably alter the origin...

  • Article
  • Open Access
4 Citations
2,097 Views
19 Pages

11 February 2023

Adaptive machine learning has increasing importance due to its ability to classify a data stream and handle the changes in the data distribution. Various resources, such as wearable sensors and medical devices, can generate a data stream with an imba...

  • Article
  • Open Access
11 Citations
2,747 Views
26 Pages

30 September 2024

The Industrial Internet of Things (IIoT) deals with vast amounts of data that must be safeguarded against tampering or theft. Identifying rare attacks and addressing data imbalances pose significant challenges in the detection of IIoT cyberattacks. I...

  • Article
  • Open Access
3 Citations
1,870 Views
17 Pages

Utilizing Nearest-Neighbor Clustering for Addressing Imbalanced Datasets in Bioengineering

  • Chih-Ming Huang,
  • Chun-Hung Lin,
  • Chuan-Sheng Hung,
  • Wun-Hui Zeng,
  • You-Cheng Zheng and
  • Chih-Min Tsai

Imbalance classification is common in scenarios like fault diagnosis, intrusion detection, and medical diagnosis, where obtaining abnormal data is difficult. This article addresses a one-class problem, implementing and refining the One-Class Nearest-...

  • Article
  • Open Access
13 Citations
2,078 Views
16 Pages

2 December 2024

Background/Objectives: This study examines the effectiveness of different resampling methods and classifier models for handling imbalanced datasets, with a specific focus on critical healthcare applications such as cancer diagnosis and prognosis. Met...

  • Article
  • Open Access
311 Citations
47,579 Views
15 Pages

16 January 2023

Educational data mining is capable of producing useful data-driven applications (e.g., early warning systems in schools or the prediction of students’ academic achievement) based on predictive models. However, the class imbalance problem in edu...

  • Article
  • Open Access
5 Citations
3,114 Views
29 Pages

Generation of Controlled Synthetic Samples and Impact of Hyper-Tuning Parameters to Effectively Classify the Complex Structure of Overlapping Region

  • Zafar Mahmood,
  • Naveed Anwer Butt,
  • Ghani Ur Rehman,
  • Muhammad Zubair,
  • Muhammad Aslam,
  • Afzal Badshah and
  • Syeda Fizzah Jilani

22 August 2022

The classification of imbalanced and overlapping data has provided customary insight over the last decade, as most real-world applications comprise multiple classes with an imbalanced distribution of samples. Samples from different classes overlap ne...

  • Article
  • Open Access
34 Citations
4,358 Views
14 Pages

Classification of Imbalanced Travel Mode Choice to Work Data Using Adjustable SVM Model

  • Yufeng Qian,
  • Mahdi Aghaabbasi,
  • Mujahid Ali,
  • Muwaffaq Alqurashi,
  • Bashir Salah,
  • Rosilawati Zainol,
  • Mehdi Moeinaddini and
  • Enas E. Hussein

15 December 2021

The investigation of travel mode choice is an essential task in transport planning and policymaking for predicting travel demands. Typically, mode choice datasets are imbalanced and learning from such datasets is challenging. This study deals with im...

  • Article
  • Open Access
17 Citations
4,883 Views
18 Pages

15 April 2021

Human activity recognition (HAR) is the study of the identification of specific human movement and action based on images, accelerometer data and inertia measurement unit (IMU) sensors. In the sensor based HAR application, most of the researchers use...

  • Article
  • Open Access
49 Citations
5,958 Views
26 Pages

Wind Turbine Fault Detection Using Highly Imbalanced Real SCADA Data

  • Cristian Velandia-Cardenas,
  • Yolanda Vidal and
  • Francesc Pozo

20 March 2021

Wind power is cleaner and less expensive compared to other alternative sources, and it has therefore become one of the most important energy sources worldwide. However, challenges related to the operation and maintenance of wind farms significantly c...

  • Article
  • Open Access
2 Citations
2,070 Views
15 Pages

14 December 2024

In high-dimensional machine learning tasks, supervised feature extraction is essential for improving model performance, with Linear Discriminant Analysis (LDA) being a common approach. However, LDA tends to deliver suboptimal performance when dealing...

  • Article
  • Open Access
2,636 Views
19 Pages

5 November 2023

Acoustic sensing provides crucial data for anomalous sound detection (ASD) in condition monitoring. However, building a robust acoustic-sensing-based ASD system is challenging due to the unsupervised nature of training data, which only contain normal...

  • Article
  • Open Access
234 Views
24 Pages

Support Vector Machine (SVM) is a popular kernel-based method for data classification that has demonstrated high efficiency across a wide range of practical applications. However, SVM suffers from several limitations, including the potential failure...

  • Article
  • Open Access
7 Citations
3,170 Views
21 Pages

Application of the Gravitational Search Algorithm for Constructing Fuzzy Classifiers of Imbalanced Data

  • Marina Bardamova,
  • Ilya Hodashinsky,
  • Anton Konev and
  • Alexander Shelupanov

28 November 2019

The presence of imbalance in data significantly complicates the classification task, including fuzzy systems. Due to a large number of instances of bigger classes, instances of smaller classes are not recognized correctly. Therefore, additional tools...

  • Article
  • Open Access
1 Citations
2,943 Views
22 Pages

1 August 2024

Deep learning is crucial in marine logistics and container crane error detection, diagnosis, and prediction. A novel deep learning technique using Long Short-Term Memory (LSTM) detected and anticipated errors in a system with imbalanced data. The LST...

  • Article
  • Open Access
3 Citations
2,931 Views
29 Pages

16 February 2025

The aviation industry generates vast amounts of data across multiple stakeholders, but critical faults and anomalies occur rarely, creating inherently imbalanced datasets that complicate machine learning applications. Traditional centralized approach...

  • Article
  • Open Access
19 Citations
6,481 Views
32 Pages

Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost

  • Leonidas Theodorakopoulos,
  • Alexandra Theodoropoulou,
  • Anastasios Tsimakis and
  • Constantinos Halkiopoulos

This study presents an optimization for a distributed machine learning framework to achieve credit card fraud detection scalability. Due to the growth in fraudulent activities, this research implements the PySpark-based processing of large-scale tran...

  • Article
  • Open Access
98 Citations
7,027 Views
21 Pages

LSTM and Bat-Based RUSBoost Approach for Electricity Theft Detection

  • Muhammad Adil,
  • Nadeem Javaid,
  • Umar Qasim,
  • Ibrar Ullah,
  • Muhammad Shafiq and
  • Jin-Ghoo Choi

25 June 2020

The electrical losses in power systems are divided into non-technical losses (NTLs) and technical losses (TLs). NTL is more harmful than TL because it includes electricity theft, faulty meters and billing errors. It is one of the major concerns in th...

  • Article
  • Open Access
4 Citations
2,405 Views
30 Pages

A Broad TSK Fuzzy Classifier with a Simplified Set of Fuzzy Rules for Class-Imbalanced Learning

  • Jinghong Zhang,
  • Yingying Li,
  • Bowen Liu,
  • Hao Chen,
  • Jie Zhou,
  • Hualong Yu and
  • Bin Qin

13 October 2023

With the expansion of data scale and diversity, the issue of class imbalance has become increasingly salient. The current methods, including oversampling and under-sampling, exhibit limitations in handling complex data, leading to overfitting, loss o...

  • Article
  • Open Access
6 Citations
2,855 Views
11 Pages

Geomagnetic field data have been found to contain earthquake (EQ) precursory signals; however, analyzing this high-resolution, imbalanced data presents challenges when implementing machine learning (ML). This study explored feasibility of principal c...

  • Article
  • Open Access
18 Citations
7,371 Views
26 Pages

14 July 2024

Credit evaluation has always been an important part of the financial field. The existing credit evaluation methods have difficulty in solving the problems of redundant data features and imbalanced samples. In response to the above issues, an ensemble...

  • Article
  • Open Access
2 Citations
1,938 Views
30 Pages

An Intelligent Kick Detection Model for Large-Hole Ultra-Deep Wells in the Sichuan Basin

  • Xudong Wang,
  • Pengcheng Wu,
  • Ye Chen,
  • Ergang Zhang,
  • Xiaoke Ye,
  • Qi Huang,
  • Chi Peng and
  • Jianhong Fu

18 November 2024

The Sichuan Basin has abundant deep and ultra-deep natural gas resources, making it a primary target for exploration and the development of China’s oil and gas industry. However, during the drilling of ultra-deep wells in the Sichuan Basin, com...

  • Article
  • Open Access
2 Citations
6,230 Views
16 Pages

13 December 2024

Stroke prediction is a vital research area due to its significant implications for public health. This comparative study offers a detailed evaluation of algorithmic methodologies and outcomes from three recent prominent studies on stroke prediction....

  • Article
  • Open Access
177 Citations
11,382 Views
21 Pages

23 April 2022

Data-driven methods have prominently featured in the progressive research and development of modern condition monitoring systems for electrical machines. These methods have the advantage of simplicity when it comes to the implementation of effective...

  • Article
  • Open Access
7 Citations
2,589 Views
16 Pages

21 June 2022

Cyber security is identified as an emerging concern for information technology management in business and society, owing to swift advances in telecommunication and wireless technologies. Cyberspace security has had a tremendous impact on numerous cru...

  • Article
  • Open Access
6 Citations
5,619 Views
20 Pages

Life Insurance Prediction and Its Sustainability Using Machine Learning Approach

  • Siti Nurasyikin Shamsuddin,
  • Noriszura Ismail and
  • R. Nur-Firyal

7 July 2023

Owning life insurance coverage that is not enough to pay for the expenses is called underinsurance, and it has been found to have a significant influence on the sustainability and financial health of families. However, insurance companies need to hav...

  • Article
  • Open Access
40 Citations
6,195 Views
16 Pages

Week-Wise Student Performance Early Prediction in Virtual Learning Environment Using a Deep Explainable Artificial Intelligence

  • Hsing-Chung Chen,
  • Eko Prasetyo,
  • Shian-Shyong Tseng,
  • Karisma Trinanda Putra,
  • Prayitno,
  • Sri Suning Kusumawardani and
  • Chien-Erh Weng

11 February 2022

Early prediction of students’ learning performance and analysis of student behavior in a virtual learning environment (VLE) are crucial to minimize the high failure rate in online courses during the COVID-19 pandemic. Nevertheless, traditional...

  • Article
  • Open Access
5 Citations
2,223 Views
22 Pages

30 September 2022

Deep learning-related technologies have achieved remarkable success in the field of intelligent fault diagnosis. Nevertheless, the traditional intelligent diagnosis methods are often based on the premise of sufficient annotation signals and balanced...

  • Article
  • Open Access
3 Citations
1,779 Views
19 Pages

A Novel SHAP-GAN Network for Interpretable Ovarian Cancer Diagnosis

  • Jingxun Cai,
  • Zne-Jung Lee,
  • Zhihxian Lin and
  • Ming-Ren Yang

6 March 2025

Ovarian cancer stands out as one of the most formidable adversaries in women’s health, largely due to its typically subtle and nonspecific early symptoms, which pose significant challenges to early detection and diagnosis. Although existing dia...

  • Article
  • Open Access
2 Citations
2,672 Views
16 Pages

Boundary-Aware Hashing for Hamming Space Retrieval

  • Wenjin Hu,
  • Yukun Chen,
  • Lifang Wu,
  • Ge Shi and
  • Meng Jian

5 January 2022

Hamming space retrieval is a hot area of research in deep hashing because it is effective for large-scale image retrieval. Existing hashing algorithms have not fully used the absolute boundary to discriminate the data inside and outside the Hamming b...

of 4