You are currently on the new version of our website. Access the old version .

78 Results Found

  • Article
  • Open Access
131 Citations
9,607 Views
19 Pages

Water Quality Prediction Using KNN Imputer and Multilayer Perceptron

  • Afaq Juna,
  • Muhammad Umer,
  • Saima Sadiq,
  • Hanen Karamti,
  • Ala’ Abdulmajid Eshmawi,
  • Abdullah Mohamed and
  • Imran Ashraf

23 August 2022

The rapid development to accommodate population growth has a detrimental effect on water quality, which is deteriorating. Consequently, water quality prediction has emerged as a topic of great interest during the past decade. Existing water quality p...

  • Article
  • Open Access
51 Citations
4,721 Views
19 Pages

Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach

  • Hanen Karamti,
  • Raed Alharthi,
  • Amira Al Anizi,
  • Reemah M. Alhebshi,
  • Ala’ Abdulmajid Eshmawi,
  • Shtwai Alsubai and
  • Muhammad Umer

4 September 2023

Objective: Cervical cancer ranks among the top causes of death among females in developing countries. The most important procedures that should be followed to guarantee the minimizing of cervical cancer’s aftereffects are early identification a...

  • Article
  • Open Access
1 Citations
933 Views
37 Pages

27 May 2025

In the process of collecting operational data for the performance analysis of water-cooled centrifugal chillers, missing values are inevitable due to various factors such as sensor errors, data transmission failures, and failure of the measurement sy...

  • Article
  • Open Access
24 Citations
5,723 Views
22 Pages

A Comparison of Imputation Approaches for Estimating Forest Biomass Using Landsat Time-Series and Inventory Data

  • Trung H. Nguyen,
  • Simon Jones,
  • Mariela Soto-Berelov,
  • Andrew Haywood and
  • Samuel Hislop

17 November 2018

The prediction of forest biomass at the landscape scale can be achieved by integrating data from field plots with satellite imagery, in particular data from the Landsat archive, using k-nearest neighbour (kNN) imputation models. While studies have de...

  • Article
  • Open Access
5 Citations
2,516 Views
17 Pages

13 May 2024

Photovoltaic (PV) power is subject to variability, influenced by factors such as meteorological conditions. This variability introduces uncertainties in forecasting, underscoring the necessity for enhanced forecasting models to support the large-scal...

  • Article
  • Open Access
44 Citations
13,441 Views
17 Pages

Water-Quality Prediction Based on H2O AutoML and Explainable AI Techniques

  • Hamza Ahmad Madni,
  • Muhammad Umer,
  • Abid Ishaq,
  • Nihal Abuzinadah,
  • Oumaima Saidani,
  • Shtwai Alsubai,
  • Monia Hamdi and
  • Imran Ashraf

25 January 2023

Rapid expansion of the world’s population has negatively impacted the environment, notably water quality. As a result, water-quality prediction has arisen as a hot issue during the last decade. Existing techniques fall short in terms of good ac...

  • Article
  • Open Access
25 Citations
4,602 Views
19 Pages

16 July 2023

In the case of missing data, traffic forecasting becomes challenging. Many existing studies on traffic flow forecasting with missing data often overlook the relationship between data imputation and external factors. To address this gap, this study pr...

  • Article
  • Open Access
63 Citations
6,762 Views
20 Pages

Learning-Based Adaptive Imputation Methodwith kNN Algorithm for Missing Power Data

  • Minkyung Kim,
  • Sangdon Park,
  • Joohyung Lee,
  • Yongjae Joo and
  • Jun Kyun Choi

21 October 2017

This paper proposes a learning-based adaptive imputation method (LAI) for imputing missing power data in an energy system. This method estimates the missing power data by using the pattern that appears in the collected data. Here, in order to capture...

  • Article
  • Open Access
23 Citations
4,441 Views
21 Pages

An Integrated Machine Learning Approach for Congestive Heart Failure Prediction

  • M. Sheetal Singh,
  • Khelchandra Thongam,
  • Prakash Choudhary and
  • P. K. Bhagat

Congestive heart failure (CHF) is one of the primary sources of mortality and morbidity among the global population. Over 26 million individuals globally are affected by heart disease, and its prevalence is rising by 2% yearly. With advances in healt...

  • Article
  • Open Access
9 Citations
2,442 Views
20 Pages

Privacy-Preserving Vertical Federated KNN Feature Imputation Method

  • Wenyou Du,
  • Yichen Wang,
  • Guanglei Meng and
  • Yuming Guo

Federated learning stands as a pivotal component in the construction of data infrastructure. It significantly fortifies the safety and reliability of data circulation links, facilitating credible sharing and openness among diverse subjects. The prese...

  • Article
  • Open Access
1 Citations
1,801 Views
22 Pages

7 September 2023

This paper introduces a modified local linear estimator (LLR) for partially linear additive models (PLAM) when the response variable is subject to random right-censoring. In the case of modeling right-censored data, PLAM offers a more flexible and re...

  • Article
  • Open Access
9 Citations
4,711 Views
18 Pages

5 December 2018

Wall-to-wall tree-lists information (lists of species and diameter for every tree) at a regional scale is required for managers to assess forest sustainability and design effective forest management strategies. Currently, the k-nearest neighbors (kNN...

  • Article
  • Open Access
33 Citations
5,464 Views
20 Pages

Evaluating k-Nearest Neighbor (kNN) Imputation Models for Species-Level Aboveground Forest Biomass Mapping in Northeast China

  • Yuanyuan Fu,
  • Hong S. He,
  • Todd J. Hawbaker,
  • Paul D. Henne,
  • Zhiliang Zhu and
  • David R. Larsen

25 August 2019

Quantifying spatially explicit or pixel-level aboveground forest biomass (AFB) across large regions is critical for measuring forest carbon sequestration capacity, assessing forest carbon balance, and revealing changes in the structure and function o...

  • Article
  • Open Access
1 Citations
914 Views
42 Pages

Advances in Imputation Strategies Supporting Peak Storm Surge Surrogate Modeling

  • WoongHee Jung,
  • Christopher Irwin,
  • Alexandros A. Taflanidis,
  • Norberto C. Nadal-Caraballo,
  • Luke A. Aucoin and
  • Madison C. Yawn

Surrogate models are widely recognized as effective, data-driven predictive tools for storm surge risk assessment. For such applications, surrogate models (referenced also as emulators or metamodels) are typically developed using existing databases o...

  • Article
  • Open Access
4 Citations
3,790 Views
21 Pages

Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation Methods

  • Gabriel-Vasilică Sasu,
  • Bogdan-Iulian Ciubotaru,
  • Nicolae Goga and
  • Andrei Vasilățeanu

21 January 2025

In geriatric healthcare, missing data pose significant challenges, especially in systems used for frailty monitoring in elderly individuals. This study explores advanced imputation techniques used to enhance data quality and maintain model performanc...

  • Article
  • Open Access
720 Views
30 Pages

24 October 2025

This study addresses the challenge of accurate classification under missing data conditions by integrating multiple imputation strategies with discriminant analysis frameworks. The proposed approach evaluates six imputation methods (Mean, Regression,...

  • Article
  • Open Access
15 Citations
3,949 Views
14 Pages

NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data

  • Jingjing Xu,
  • Yuanshan Wang,
  • Xiangnan Xu,
  • Kian-Kai Cheng,
  • Daniel Raftery and
  • Jiyang Dong

24 September 2021

In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodo...

  • Article
  • Open Access
13 Citations
4,621 Views
21 Pages

In clinical datasets, missing data often occur due to various reasons including non-response, data corruption, and errors in data collection or processing. Such missing values can lead to biased statistical analyses, reduced statistical power, and po...

  • Article
  • Open Access
32 Citations
5,907 Views
22 Pages

A Workflow for Missing Values Imputation of Untargeted Metabolomics Data

  • Tariq Faquih,
  • Maarten van Smeden,
  • Jiao Luo,
  • Saskia le Cessie,
  • Gabi Kastenmüller,
  • Jan Krumsiek,
  • Raymond Noordam,
  • Diana van Heemst,
  • Frits R. Rosendaal and
  • Dennis O. Mook-Kanamori
  • + 2 authors

26 November 2020

Metabolomics studies have seen a steady growth due to the development and implementation of affordable and high-quality metabolomics platforms. In large metabolite panels, measurement values are frequently missing and, if neglected or sub-optimally i...

  • Article
  • Open Access
2 Citations
4,581 Views
16 Pages

Bridging the Gap: Missing Data Imputation Methods and Their Effect on Dementia Classification Performance

  • Federica Aracri,
  • Maria Giovanna Bianco,
  • Andrea Quattrone and
  • Alessia Sarica

Background/Objectives: Missing data is a common challenge in neuroscience and neuroimaging studies, especially in the context of neurodegenerative disorders such as Mild Cognitive Impairment (MCI) and Alzheimer’s Disease (AD). Inadequate handli...

  • Article
  • Open Access
27 Citations
7,682 Views
20 Pages

The SECOM dataset contains information about a semiconductor production line, entailing the products that failed the in-house test line and their attributes. This dataset, similar to most semiconductor manufacturing data, contains missing values, imb...

  • Article
  • Open Access
15 Citations
1,794 Views
16 Pages

Partial Discharge Localization through k-NN and SVM

  • Permit Mathuhu Sekatane and
  • Pitshou Bokoro

3 November 2023

Power transformers are essential for the distribution and transmission of electricity, but they are prone to degradation due to faults early on. Partial Discharge (PD) is the most significant pointer of insulation breakdown in high-voltage apparatus....

  • Article
  • Open Access
17 Citations
3,713 Views
17 Pages

Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study

  • Laura Erhan,
  • Mario Di Mauro,
  • Ashiq Anjum,
  • Ovidiu Bagdasar,
  • Wei Song and
  • Antonio Liotta

23 November 2021

Recent developments in cloud computing and the Internet of Things have enabled smart environments, in terms of both monitoring and actuation. Unfortunately, this often results in unsustainable cloud-based solutions, whereby, in the interest of simpli...

  • Article
  • Open Access
5 Citations
2,732 Views
18 Pages

A Method of Pruning and Random Replacing of Known Values for Comparing Missing Data Imputation Models for Incomplete Air Quality Time Series

  • Luis Alfonso Menéndez García,
  • Marta Menéndez Fernández,
  • Violetta Sokoła-Szewioła,
  • Laura Álvarez de Prado,
  • Almudena Ortiz Marqués,
  • David Fernández López and
  • Antonio Bernardo Sánchez

25 June 2022

The data obtained from air quality monitoring stations, which are used to carry out studies using data mining techniques, present the problem of missing values. This paper describes a research work on missing data imputation. Among the most common me...

  • Article
  • Open Access
1 Citations
998 Views
13 Pages

Missing Data in OHCA Registries: How Imputation Methods Affect Research Conclusions—Paper I

  • Stella Jinran Zhan,
  • Seyed Ehsan Saffari,
  • Marcus Eng Hock Ong and
  • Fahad Javaid Siddiqui

8 September 2025

Background/Objectives: Clinical observational studies often encounter missing data, which complicates association evaluation with reduced bias while accounting for confounders. This is particularly challenging in multi-national registries such as tho...

  • Article
  • Open Access
31 Citations
4,486 Views
16 Pages

9 April 2023

The Air Quality Index (AQI) dataset contains information on measurements of pollutants and ambient air quality conditions at certain location that can be used to predict air quality. Unfortunately, this dataset often has many missing observations and...

  • Article
  • Open Access
13 Citations
4,657 Views
19 Pages

29 October 2020

Medical data usually have missing values; hence, imputation methods have become an important issue. In previous studies, many imputation methods based on variable data had a multivariate normal distribution, such as expectation-maximization and regre...

  • Article
  • Open Access
23 Citations
3,613 Views
23 Pages

Discrete Missing Data Imputation Using Multilayer Perceptron and Momentum Gradient Descent

  • Hu Pan,
  • Zhiwei Ye,
  • Qiyi He,
  • Chunyan Yan,
  • Jianyu Yuan,
  • Xudong Lai,
  • Jun Su and
  • Ruihan Li

28 July 2022

Data are a strategic resource for industrial production, and an efficient data-mining process will increase productivity. However, there exist many missing values in data collected in real life due to various problems. Because the missing data may re...

  • Article
  • Open Access
14 Citations
6,873 Views
31 Pages

Complex Data Imputation by Auto-Encoders and Convolutional Neural Networks—A Case Study on Genome Gap-Filling

  • Luca Cappelletti,
  • Tommaso Fontana,
  • Guido Walter Di Donato,
  • Lorenzo Di Tucci,
  • Elena Casiraghi and
  • Giorgio Valentini

Missing data imputation has been a hot topic in the past decade, and many state-of-the-art works have been presented to propose novel, interesting solutions that have been applied in a variety of fields. In the past decade, the successful results ach...

  • Article
  • Open Access
25 Citations
4,261 Views
29 Pages

Prediction of Diameter Distributions with Multimodal Models Using LiDAR Data in Subtropical Planted Forests

  • Zhengnan Zhang,
  • Lin Cao,
  • Christopher Mulverhill,
  • Hao Liu,
  • Yong Pang and
  • Zengyuan Li

4 February 2019

Tree diameter distributions are essential for the calculation of stem volume and biomass, as well as simulation of growth and yield and to understand timber assortments. Accurate and reliable prediction of tree diameter distributions is critical for...

  • Article
  • Open Access
8 Citations
6,375 Views
12 Pages

Background: Heart failure poses a significant global health challenge, with high rates of readmission and mortality. Accurate models to predict these outcomes are essential for effective patient management. This study investigates the impact of data...

  • Article
  • Open Access
26 Citations
3,681 Views
18 Pages

Improving Human Activity Monitoring by Imputation of Missing Sensory Data: Experimental Study

  • Ivan Miguel Pires,
  • Faisal Hussain,
  • Nuno M. Garcia and
  • Eftim Zdravevski

17 September 2020

The automatic recognition of human activities with sensors available in off-the-shelf mobile devices has been the subject of different research studies in recent years. It may be useful for the monitoring of elderly people to present warning situatio...

  • Article
  • Open Access
3 Citations
1,973 Views
18 Pages

Comparison of Models for Missing Data Imputation in PM-2.5 Measurement Data

  • Ju-Yong Lee,
  • Seung-Hee Han,
  • Jin-Goo Kang,
  • Chae-Yeon Lee,
  • Jeong-Beom Lee,
  • Hyeun-Soo Kim,
  • Hui-Young Yun and
  • Dae-Ryun Choi

9 April 2025

The accurate monitoring and analysis of PM-2.5 are critical for improving air quality and formulating public health policies. However, environmental data often contain missing values due to equipment failures, data collection errors, or extreme weath...

  • Article
  • Open Access
69 Citations
12,627 Views
18 Pages

Use of Machine Learning Techniques in Soil Classification

  • Yaren Aydın,
  • Ümit Işıkdağ,
  • Gebrail Bekdaş,
  • Sinan Melih Nigdeli and
  • Zong Woo Geem

28 January 2023

In the design of reliable structures, the soil classification process is the first step, which involves costly and time-consuming work including laboratory tests. Machine learning (ML), which has wide use in many scientific fields, can be utilized fo...

  • Article
  • Open Access
125 Views
12 Pages

Missing Data in OHCA Registries: How Multiple Imputation Methods Affect Research Conclusions—Paper II

  • Stella Jinran Zhan,
  • Seyed Ehsan Saffari,
  • Marcus Eng Hock Ong and
  • Fahad Javaid Siddiqui

16 January 2026

Background/Objectives: Missing data in clinical observational studies, such as out-of-hospital cardiac arrest (OHCA) registries, can compromise statistical validity. Single imputation methods are simple alternatives to complete-case analysis (CCA) bu...

  • Article
  • Open Access
410 Views
27 Pages

28 October 2025

The integrity of real-time monitoring data is paramount to the accuracy of scientific research and the reliability of decision-making. However, data incompleteness arising from transmission interruptions or extreme weather disrupting equipment operat...

  • Article
  • Open Access
32 Citations
6,214 Views
16 Pages

Missing Data Imputation in the Internet of Things Sensor Networks

  • Benjamin Agbo,
  • Hussain Al-Aqrabi,
  • Richard Hill and
  • Tariq Alsboui

The Internet of Things (IoT) has had a tremendous impact on the evolution and adoption of information and communication technology. In the modern world, data are generated by individuals and collected automatically by physical objects that are fitted...

  • Article
  • Open Access
10 Citations
2,753 Views
36 Pages

Integration of Node Classification in Storm Surge Surrogate Modeling

  • Aikaterini P. Kyprioti,
  • Alexandros A. Taflanidis,
  • Norberto C. Nadal-Caraballo,
  • Madison C. Yawn and
  • Luke A. Aucoin

Surrogate models, also referenced as metamodels, have emerged as attractive data-driven, predictive models for storm surge estimation. They are calibrated based on an existing database of synthetic storm simulations and can provide fast-to-compute ap...

  • Article
  • Open Access
156 Citations
17,042 Views
20 Pages

Influence of Missing Values Substitutes on Multivariate Analysis of Metabolomics Data

  • Piotr S. Gromski,
  • Yun Xu,
  • Helen L. Kotze,
  • Elon Correa,
  • David I. Ellis,
  • Emily Grace Armitage,
  • Michael L. Turner and
  • Royston Goodacre

16 June 2014

Missing values are known to be problematic for the analysis of gas chromatography-mass spectrometry (GC-MS) metabolomics data. Typically these values cover about 10%–20% of all data and can originate from various backgrounds, including analytical, co...

  • Article
  • Open Access
104 Citations
10,488 Views
18 Pages

8 January 2019

Over the past decade, PV power plants have increasingly contributed to power generation. However, PV power generation widely varies due to environmental factors; thus, the accurate forecasting of PV generation becomes essential. Meanwhile, weather da...

  • Article
  • Open Access
50 Citations
6,377 Views
13 Pages

Missing Data Imputation for Geolocation-based Price Prediction Using KNN–MCF Method

  • Karshiev Sanjar,
  • Olimov Bekhzod,
  • Jaesoo Kim,
  • Anand Paul and
  • Jeonghong Kim

Accurate house price forecasts are very important for formulating national economic policies. In this paper, we offer an effective method to predict houses’ sale prices. Our algorithm includes one-hot encoding to convert text data into numeric...

  • Article
  • Open Access
3 Citations
2,652 Views
17 Pages

17 December 2024

Mass-spectrometry-based proteomics frequently utilizes label-free quantification strategies due to their cost-effectiveness, methodological simplicity, and capability to identify large numbers of proteins within a single analytical run. Despite these...

  • Article
  • Open Access
55 Citations
6,471 Views
26 Pages

Fine-Tuning Fuzzy KNN Classifier Based on Uncertainty Membership for the Medical Diagnosis of Diabetes

  • Hanaa Salem,
  • Mahmoud Y. Shams,
  • Omar M. Elzeki,
  • Mohamed Abd Elfattah,
  • Jehad F. Al-Amri and
  • Shaima Elnazer

18 January 2022

Diabetes, a metabolic disease in which the blood glucose level rises over time, is one of the most common chronic diseases at present. It is critical to accurately predict and classify diabetes to reduce the severity of the disease and treat it early...

  • Article
  • Open Access
2 Citations
1,993 Views
16 Pages

Prioritization of Fluorescence In Situ Hybridization (FISH) Probes for Differentiating Primary Sites of Neuroendocrine Tumors with Machine Learning

  • Lucas Pietan,
  • Hayley Vaughn,
  • James R. Howe,
  • Andrew M. Bellizzi,
  • Brian J. Smith,
  • Benjamin Darbro,
  • Terry Braun and
  • Thomas Casavant

12 December 2023

Determining neuroendocrine tumor (NET) primary sites is pivotal for patient care as pancreatic NETs (pNETs) and small bowel NETs (sbNETs) have distinct treatment approaches. The diagnostic power and prioritization of fluorescence in situ hybridizatio...

  • Article
  • Open Access
1 Citations
982 Views
16 Pages

Missing Data in Orthopaedic Clinical Outcomes Research: A Sensitivity Analysis of Imputation Techniques Utilizing a Large Multicenter Total Shoulder Arthroplasty Database

  • Kevin A. Hao,
  • Terrie Vasilopoulos,
  • Josie Elwell,
  • Christopher P. Roche,
  • Keegan M. Hones,
  • Jonathan O. Wright,
  • Joseph J. King,
  • Thomas W. Wright,
  • Ryan W. Simovitch and
  • Bradley S. Schoch

29 May 2025

Background: When missing data are present in clinical outcomes studies, complete-case analysis (CCA) is often performed, whereby patients with missing data are excluded. While simple, CCA analysis may impart selection bias and reduce statistical powe...

  • Article
  • Open Access
19 Citations
4,613 Views
16 Pages

21 November 2022

Statistical analyses often require unbiased and reliable data completion. In this work, we imputed missing fine particulate matter (PM2.5) observations from eight years (2012–2019) of records in 59 air quality monitoring (AQM) stations in Israe...

  • Article
  • Open Access

26 January 2026

In Internet of Things (IoT) systems, data collected by geographically distributed sensors is often incomplete due to device failures, harsh deployment conditions, energy constraints, and unreliable communication. Such data gaps can significantly degr...

  • Article
  • Open Access
4 Citations
2,251 Views
16 Pages

Intelligent Identification and Order-Sensitive Correction Method of Outliers from Multi-Data Source Based on Historical Data Mining

  • Guangyu Chen,
  • Zhengyang Zhu,
  • Li Yang,
  • Wenhao Huang,
  • Yuzhuo Zhang,
  • Gang Lin and
  • Shengjie Zhang

7 September 2022

In recent years, outliers caused by manual operation errors and equipment acquisition failures often occur, bringing challenges to big data analysis. In view of the difficulties in identifying and correcting outliers of multi-source data, an intellig...

  • Review
  • Open Access
63 Views
27 Pages

A Survey on Missing Data Generation in Networks

  • Qi Shao,
  • Ruizhe Shi,
  • Xiaoyu Zhang and
  • Duxin Chen

20 January 2026

The prevalence of massive, multi-scale, high-dimensional, and dynamic data sets resulting from advances in information and network communication technologies is frequently hampered by data incompleteness, a consequence of complex network structures a...

  • Article
  • Open Access
1 Citations
1,664 Views
19 Pages

18 December 2024

Floods are a significant and pervasive threat globally, exacerbated by climate change and increasing extreme weather events. The Gravity Recovery and Climate Experiment (GRACE) and its follow-on mission (GRACE-FO) provide crucial insights into terres...

of 2