MDPI - Publisher of Open Access Journals

32 pages, 502 KiB

Open AccessArticle

Bayesian Random Forest with Multiple Imputation by Chain Equations for High-Dimensional Missing Data: A Simulation Study

by Oyebayo Ridwan Olaniran and Ali Rashash R. Alzahrani

Mathematics 2025, 13(6), 956; https://doi.org/10.3390/math13060956 - 13 Mar 2025

Cited by 1 | Viewed by 973

Abstract

The pervasive challenge of missing data in scientific research forces a critical trade-off: discarding incomplete observations, which risks significant information loss, while conventional imputation methods struggle to maintain accuracy in high-dimensional settings. Although approaches like multiple imputation (MI) and random forest (RF) proximity-based [...] Read more.

The pervasive challenge of missing data in scientific research forces a critical trade-off: discarding incomplete observations, which risks significant information loss, while conventional imputation methods struggle to maintain accuracy in high-dimensional settings. Although approaches like multiple imputation (MI) and random forest (RF) proximity-based imputation offer improvements over naive deletion, they exhibit limitations in complex missing data scenarios or sparse high-dimensional settings. To address these gaps, we propose a novel integration of Multiple Imputation by Chained Equations (MICE) with Bayesian Random Forest (BRF), leveraging MICE’s iterative flexibility and BRF’s probabilistic robustness to enhance the imputation accuracy and downstream predictive performance. Our hybrid framework, BRF-MICE, uniquely combines the efficiency of MICE’s chained equations with BRF’s ability to quantify uncertainty through Bayesian tree ensembles, providing stable parameter estimates even under extreme missingness. We empirically validate this approach using synthetic datasets with controlled missingness mechanisms (MCAR, MAR, MNAR) and dimensionality, contrasting it against established methods, including RF and Bayesian Additive Regression Trees (BART). The results demonstrate that BRF-MICE achieves a superior performance in classification and regression tasks, with a 15–20% lower error under varying missingness conditions compared to RF and BART while maintaining computational scalability. The method’s iterative Bayesian updates effectively propagate imputation uncertainty, reducing overconfidence in high-dimensional predictions, a key weakness of frequentist alternatives. Full article

(This article belongs to the Section D1: Probability and Statistics)

► Show Figures

Figure 1

20 pages, 932 KiB

Open AccessArticle

Gradient-Based Multiple Robust Learning Calibration on Data Missing-Not-at-Random via Bi-Level Optimization

by Shuxia Gong and Chen Ma

Entropy 2025, 27(2), 196; https://doi.org/10.3390/e27020196 - 13 Feb 2025

Viewed by 871

Abstract

Recommendation systems (RS) have become integral to numerous digital platforms and applications, ranging from e-commerce to content streaming field. A critical problem in RS is that the ratings are missing not at random (MNAR), which is due to the users always giving feedback [...] Read more.

Recommendation systems (RS) have become integral to numerous digital platforms and applications, ranging from e-commerce to content streaming field. A critical problem in RS is that the ratings are missing not at random (MNAR), which is due to the users always giving feedback on items with self-selection. The biased selection of rating data results in inaccurate rating prediction for all user-item pairs. Doubly robust (DR) learning has been studied in many tasks in RS, which is unbiased when either a single imputation or a single propensity model is accurate. In addition, multiple robust (MR) has been proposed with multiple imputation models and propensity models, and is unbiased when there exists a linear combination of these imputation models and propensity models is correct. However, we claim that the imputed errors and propensity scores are miscalibrated in the MR method. In this paper, we propose a gradient-based calibrated multiple robust learning method to enhance the debiasing performance and reliability of the rating prediction model. Specifically, we propose to use bi-level optimization to solve the weights and model coefficients of each propensity and imputation model in MR framework. Moreover, we adopt the differentiable expected calibration error as part of the objective to optimize the model calibration quality directly. Experiments on three real-world datasets show that our method outperforms the state-of-the-art baselines. Full article

(This article belongs to the Special Issue Causal Inference in Recommender Systems)

► Show Figures

Figure 1

21 pages, 2199 KiB

Open AccessArticle

Addressing Missing Data Challenges in Geriatric Health Monitoring: A Study of Statistical and Machine Learning Imputation Methods

by Gabriel-Vasilică Sasu, Bogdan-Iulian Ciubotaru, Nicolae Goga and Andrei Vasilățeanu

Sensors 2025, 25(3), 614; https://doi.org/10.3390/s25030614 - 21 Jan 2025

Cited by 1 | Viewed by 1958

Abstract

In geriatric healthcare, missing data pose significant challenges, especially in systems used for frailty monitoring in elderly individuals. This study explores advanced imputation techniques used to enhance data quality and maintain model performance in a system designed to detect frailty insights. We introduce [...] Read more.

In geriatric healthcare, missing data pose significant challenges, especially in systems used for frailty monitoring in elderly individuals. This study explores advanced imputation techniques used to enhance data quality and maintain model performance in a system designed to detect frailty insights. We introduce missing data mechanisms—Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR)—into a dataset collected from smart bracelets, simulating real-world conditions. Imputation methods, including Expectation–Maximization (EM), matrix completion, Bayesian networks, K-Nearest Neighbors (KNN), Support Vector Machines (SVMs), Generative Adversarial Imputation Networks (GAINs), Variational Autoencoder (VAE), and GRU-D, were evaluated based on normalized Mean Squared Error (MSE), Mean Absolute Error (MAE), and R² metrics. The results demonstrate that KNN and SVM consistently outperform other methods across all three mechanisms due to their ability to adapt to diverse patterns of missingness. Specifically, KNN and SVM excel in MAR conditions by leveraging observed data relationships to accurately infer missing values, while their robustness to randomness enables superior performance under MCAR scenarios. In MNAR contexts, KNN and SVM effectively handle unobserved dependencies by identifying underlying patterns in the data, outperforming methods like GRU-D and VAE. These findings highlight the importance of selecting imputation methods based on the characteristics of missing data mechanisms, emphasizing the versatility and reliability of KNN and SVM in healthcare applications. This study advocates for hybrid approaches in healthcare applications like the cINnAMON project, which supports elderly individuals at risk of frailty through non-intrusive home monitoring systems. Full article

(This article belongs to the Special Issue Non-Intrusive Sensors for Human Activity Detection and Recognition)

► Show Figures

Graphical abstract

17 pages, 2080 KiB

Open AccessArticle

Comprehensive Evaluation of Advanced Imputation Methods for Proteomic Data Acquired via the Label-Free Approach

by Grzegorz Wryk, Andrzej Gawor and Ewa Bulska

Int. J. Mol. Sci. 2024, 25(24), 13491; https://doi.org/10.3390/ijms252413491 - 17 Dec 2024

Viewed by 1341

Abstract

Mass-spectrometry-based proteomics frequently utilizes label-free quantification strategies due to their cost-effectiveness, methodological simplicity, and capability to identify large numbers of proteins within a single analytical run. Despite these advantages, the prevalence of missing values (MV), which can impact up to 50% of the [...] Read more.

Mass-spectrometry-based proteomics frequently utilizes label-free quantification strategies due to their cost-effectiveness, methodological simplicity, and capability to identify large numbers of proteins within a single analytical run. Despite these advantages, the prevalence of missing values (MV), which can impact up to 50% of the data matrix, poses a significant challenge by reducing the accuracy, reproducibility, and interpretability of the results. Consequently, effective handling of missing values is crucial for reliable quantitative analysis in proteomic studies. This study systematically evaluated the performance of selected imputation methods for addressing missing values in proteomic dataset. Two protein identification algorithms, FragPipe and MaxQuant, were employed to generate datasets, enabling an assessment of their influence on im-putation efficacy. Ten imputation methods, representing three methodological categories—single-value (LOD, ND, SampMin), local-similarity (kNN, LLS, RF), and global-similarity approaches (LSA, BPCA, PPCA, SVD)—were analyzed. The study also investigated the impact of data logarithmization on imputation performance. The evaluation process was conducted in two stages. First, performance metrics including normalized root mean square error (NRMSE) and the area under the receiver operating characteristic (ROC) curve (AUC) were applied to datasets with artificially introduced missing values. The datasets were designed to mimic varying MV rates (10%, 25%, 50%) and proportions of values missing not at random (MNAR) (0%, 20%, 40%, 80%, 100%). This step enabled the assessment of data characteristics on the relative effectiveness of the imputation methods. Second, the imputation strategies were applied to real proteomic datasets containing natural missing values, focusing on the true-positive (TP) classification of proteins to evaluate their practical utility. The findings highlight that local-similarity-based methods, particularly random forest (RF) and local least-squares (LLS), consistently exhibit robust performance across varying MV scenarios. Furthermore, data logarithmization significantly enhances the effectiveness of global-similarity methods, suggesting it as a beneficial preprocessing step prior to imputation. The study underscores the importance of tailoring imputation strategies to the specific characteristics of the data to maximize the reliability of label-free quantitative proteomics. Interestingly, while the choice of protein identification algorithm (FragPipe vs. MaxQuant) had minimal influence on the overall imputation error, differences in the number of proteins classified as true positives revealed more nuanced effects, emphasizing the interplay between imputation strategies and downstream analysis outcomes. These findings provide a comprehensive framework for improving the accuracy and reproducibility of proteomic analyses through an informed selection of imputation approaches. Full article

(This article belongs to the Special Issue Role of Proteomics in Human Diseases and Infections)

► Show Figures

Figure 1

15 pages, 1045 KiB

Open AccessArticle

Adaptive Imputation of Irregular Truncated Signals with Machine Learning

by Tyler Ward, Kouroush Jenab and Jorge Ortega-Moody

Appl. Sci. 2024, 14(15), 6828; https://doi.org/10.3390/app14156828 - 5 Aug 2024

Cited by 1 | Viewed by 1223

Abstract

In modern advanced manufacturing systems, the use of smart sensors and other Internet of Things (IoT) technology to provide real-time feedback to operators about the condition of various machinery or other equipment is prevalent. A notable issue in such IoT-based advanced manufacturing systems [...] Read more.

In modern advanced manufacturing systems, the use of smart sensors and other Internet of Things (IoT) technology to provide real-time feedback to operators about the condition of various machinery or other equipment is prevalent. A notable issue in such IoT-based advanced manufacturing systems is the problem of connectivity, where a dropped Internet connection can lead to the loss of important condition data from a machine. Such gaps in the data, which we call irregular truncated signals, can lead to incorrect assumptions about the status of a machine and other flawed decision-making processes. This paper presents an adaptive data imputation framework based on machine learning (ML) algorithms to assess whether the missing data in a signal is missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR) and automatically select an appropriate ML-based data imputation model to deal with the missing data. Our results demonstrate the potential for applying ML algorithms to the challenge of irregularly truncated signals, as well as the capability of our adaptive framework to intelligently solve this issue. Full article

(This article belongs to the Special Issue Smart Design and Advanced Manufacturing: Integrating Emerging Technologies for Improved Production Processes)

► Show Figures

Figure 1

22 pages, 64544 KiB

Open AccessArticle

Statuary Qualities of White and Black Göktepe Identified in the Hispanic Valdetorres de Jarama Marble Collection

by Maria Pilar Lapuente Mercadal and Trinidad Nogales-Basarrate

Minerals 2024, 14(8), 797; https://doi.org/10.3390/min14080797 - 3 Aug 2024

Viewed by 1068

Abstract

This paper focuses on the role of the most common mineralogical techniques applied to the identification of the different statuary qualities in white, grey, and black Göktepe marble. For this purpose, the case of a Roman sculpture marble collection from the rural villa [...] Read more.

This paper focuses on the role of the most common mineralogical techniques applied to the identification of the different statuary qualities in white, grey, and black Göktepe marble. For this purpose, the case of a Roman sculpture marble collection from the rural villa of Valdetorres de Jarama (central Iberia), dating to the 4th century AD, is presented. The mythological statuary, combining white, grey, and black marbles, is one of the most outstanding marble collections in the Aphrodisian style found in Hispania. The analytical results (achieved through Petrography, Cathodoluminescence, C and O isotopes, and Sr and Mn concentration) support the identification of two varieties of “black” Göktepe, traditionally referred to as bigio morato and bigio antico, as well as the best statuary quality of white Göktepe. In addition, the analytical identification of other Asiatic marbles in the Valdetorres collection, a white coarse-grain originally from the quarries of Aphrodisias city, and one small piece identified as Carian red from Iasos, corroborates the already suggested strong connection existing between artists and the stone material they chose for their works. Finally, the identification carried out on the marble of the bases that served as seats for the sculptures is noteworthy, as it is a white marble of lower quality whose analytical characteristics are consistent with the Microasiatic marble of Denizli. The use of these exotic and exceptional raw materials confirms the taste for luxury and decorative richness in Late Antique Hispanic rural villae and contributes to a better understanding of the distribution of Aphrodisian production and trade networks with the Western Roman provinces. Full article

(This article belongs to the Special Issue Characterization and Provenance Analysis of Ancient Stone Materials: Insights from Mineralogy, Petrology and Geochemistry)

► Show Figures

Figure 1

28 pages, 12834 KiB

Open AccessArticle

Natural Dyes in Embroideries of Byzantine Tradition, the Collection of Embroidered Aëres and Epitaphioi in the National Museum of Art of Romania

by Irina Petroviciu, Emanuela Cernea, Iolanda Turcu, Silvana Vasilca and Ina Vanden Berghe

Heritage 2024, 7(6), 3248-3275; https://doi.org/10.3390/heritage7060153 - 11 Jun 2024

Viewed by 1719

Abstract

The medieval textiles collection of the National Museum of Art of Romania (MNAR) has been in place since 1865 and nowadays preserves about 1000 medieval and pre-modern weavings and embroideries. These extremely valuable objects, dated between the 14th and the 19th centuries, are [...] Read more.

The medieval textiles collection of the National Museum of Art of Romania (MNAR) has been in place since 1865 and nowadays preserves about 1000 medieval and pre-modern weavings and embroideries. These extremely valuable objects, dated between the 14th and the 19th centuries, are mainly religious embroidered garments and veils with special significance in the Byzantine li-turgy. Ecclesiastical embroideries of Byzantine tradition are characterized by a complex technique: metallic threads with a silk core, metallic wires and coloured silk threads are couched over padding on layers of silk and cellulosic supports so as to create relief through light reflection. The silk sup-ports and the sewing threads are coloured, mainly in red, blue, green and yellow hues, and analytical investigations of the dyes used in embroideries preserved in the MNAR, in the Putna and Sucevița Monasteries, have been released in previous studies by the corresponding author. The present work continues the approach with research into dyes in about 25 aëres and epitaphioi from the MNAR collection. Considering their privileged function in the liturgical ritual, these luxurious pieces embroidered with silver, gilded silver or coloured silk threads and decorated with pearls, sequins or semi-precious stones are the most faithful description of the stylistic and technological evolution of the art of post-Byzantine embroidery in the Romanian provinces. The data resulting from the present research will improve the knowledge regarding this topic. Dye analysis was performed by liquid chromatography with diode array detection, while fibres were characterized by infrared spectroscopy (with attenuated total reflectance) and optical microscopy. The biological sources identified—carminic acid-based dyes, redwood, dyer’s broom, weld, indigo-based dyes––will be discussed in correspondence with their use in the embroidery technique: support, lining and embroidery threads, together with other sources previously reported on Byzantine embroideries in Romanian collections, and in similar objects preserved at Holy Mount Athos. Full article

(This article belongs to the Special Issue Dyes in History and Archaeology 42)

► Show Figures

Figure 1

17 pages, 3836 KiB

Open AccessArticle

Missing Data Statistics Provide Causal Insights into Data Loss in Diabetes Health Monitoring by Wearable Sensors

by Carlijn I. R. Braem, Utku S. Yavuz, Hermie J. Hermens and Peter H. Veltink

Sensors 2024, 24(5), 1526; https://doi.org/10.3390/s24051526 - 27 Feb 2024

Cited by 6 | Viewed by 2460

Abstract

Background: Data loss in wearable sensors is an inevitable problem that leads to misrepresentation during diabetes health monitoring. We systematically investigated missing wearable sensors data to get causal insight into the mechanisms leading to missing data. Methods: Two-week-long data from a continuous glucose [...] Read more.

Background: Data loss in wearable sensors is an inevitable problem that leads to misrepresentation during diabetes health monitoring. We systematically investigated missing wearable sensors data to get causal insight into the mechanisms leading to missing data. Methods: Two-week-long data from a continuous glucose monitor and a Fitbit activity tracker recording heart rate (HR) and step count in free-living patients with type 2 diabetes mellitus were used. The gap size distribution was fitted with a Planck distribution to test for missing not at random (MNAR) and a difference between distributions was tested with a Chi-squared test. Significant missing data dispersion over time was tested with the Kruskal–Wallis test and Dunn post hoc analysis. Results: Data from 77 subjects resulted in 73 cleaned glucose, 70 HR and 68 step count recordings. The glucose gap sizes followed a Planck distribution. HR and step count gap frequency differed significantly (p < 0.001), and the missing data were therefore MNAR. In glucose, more missing data were found in the night (23:00–01:00), and in step count, more at measurement days 6 and 7 (p < 0.001). In both cases, missing data were caused by insufficient frequency of data synchronization. Conclusions: Our novel approach of investigating missing data statistics revealed the mechanisms for missing data in Fitbit and CGM data. Full article

(This article belongs to the Special Issue Sensors/Sensing Technologies and Signal Processing in Continuous Health Monitoring)

► Show Figures

Figure 1

14 pages, 3824 KiB

Open AccessEditor’s ChoiceArticle

A Machine Learning-Based Multiple Imputation Method for the Health and Aging Brain Study–Health Disparities

by Fan Zhang, Melissa Petersen, Leigh Johnson, James Hall, Raymond F. Palmer, Sid E. O’Bryant and on behalf of the Health and Aging Brain Study (HABS–HD) Study Team

Informatics 2023, 10(4), 77; https://doi.org/10.3390/informatics10040077 - 11 Oct 2023

Cited by 2 | Viewed by 3707

Abstract

The Health and Aging Brain Study–Health Disparities (HABS–HD) project seeks to understand the biological, social, and environmental factors that impact brain aging among diverse communities. A common issue for HABS–HD is missing data. It is impossible to achieve accurate machine learning (ML) if [...] Read more.

The Health and Aging Brain Study–Health Disparities (HABS–HD) project seeks to understand the biological, social, and environmental factors that impact brain aging among diverse communities. A common issue for HABS–HD is missing data. It is impossible to achieve accurate machine learning (ML) if data contain missing values. Therefore, developing a new imputation methodology has become an urgent task for HABS–HD. The three missing data assumptions, (1) missing completely at random (MCAR), (2) missing at random (MAR), and (3) missing not at random (MNAR), necessitate distinct imputation approaches for each mechanism of missingness. Several popular imputation methods, including listwise deletion, min, mean, predictive mean matching (PMM), classification and regression trees (CART), and missForest, may result in biased outcomes and reduced statistical power when applied to downstream analyses such as testing hypotheses related to clinical variables or utilizing machine learning to predict AD or MCI. Moreover, these commonly used imputation techniques can produce unreliable estimates of missing values if they do not account for the missingness mechanisms or if there is an inconsistency between the imputation method and the missing data mechanism in HABS–HD. Therefore, we proposed a three-step workflow to handle missing data in HABS–HD: (1) missing data evaluation, (2) imputation, and (3) imputation evaluation. First, we explored the missingness in HABS–HD. Then, we developed a machine learning-based multiple imputation method (MLMI) for imputing missing values. We built four ML-based imputation models (support vector machine (SVM), random forest (RF), extreme gradient boosting (XGB), and lasso and elastic-net regularized generalized linear model (GLMNET)) and adapted the four ML-based models to multiple imputations using the simple averaging method. Lastly, we evaluated and compared MLMI with other common methods. Our results showed that the three-step workflow worked well for handling missing values in HABS–HD and the ML-based multiple imputation method outperformed other common methods in terms of prediction performance and change in distribution and correlation. The choice of missing handling methodology has a significant impact on the accompanying statistical analyses of HABS–HD. The conceptual three-step workflow and the ML-based multiple imputation method perform well for our Alzheimer’s disease models. They can also be applied to other disease data analyses. Full article

(This article belongs to the Special Issue Novel Informatics Algorithms and Applications to Biomedicine and Healthcare)

► Show Figures

Figure 1

20 pages, 3324 KiB

Open AccessArticle

Traffic Status Prediction Based on Multidimensional Feature Matching and 2nd-Order Hidden Markov Model (HMM)

by Fei Li, Kai Liu and Jialiang Chen

Sustainability 2023, 15(20), 14671; https://doi.org/10.3390/su152014671 - 10 Oct 2023

Cited by 2 | Viewed by 1451

Abstract

Spatiotemporal data from urban road traffic are pivotal for intelligent transportation systems and urban planning. Nonetheless, missing data in traffic datasets is a common challenge due to equipment failures, communication issues, and monitoring limitations, especially the missing not at random (MNAR) problem. This [...] Read more.

Spatiotemporal data from urban road traffic are pivotal for intelligent transportation systems and urban planning. Nonetheless, missing data in traffic datasets is a common challenge due to equipment failures, communication issues, and monitoring limitations, especially the missing not at random (MNAR) problem. This research introduces an approach to address MNAR-type missing data in traffic status prediction, utilizing a multidimensional feature sequence and a second-order hidden Markov model (2nd-order HMM). First, this approach involves extracting spatiotemporal features for the preset data sections and spatial features for the sections to be predicted based on the traffic spatiotemporal characteristics. Second, using the extracted features, distinctive road traffic features are generated for each section. Furthermore, at specific intervals within the defined time period, nearest distance feature matching is introduced to ascertain the traffic attributes of the road section under prediction. Finally, relying on the matched status results, a 2nd-order HMM is employed to forecast the traffic status for subsequent moments within the defined time period. Experiments were carried out using datasets from Shenzhen City and compared against the hidden Markov models and contrast measure (HMM-C) method to affirm the efficacy of the proposed approach. Full article

(This article belongs to the Special Issue Sustainable Transportation and Urban Planning)

► Show Figures

Figure 1

19 pages, 8780 KiB

Open AccessArticle

Effect of Missing Data Types and Imputation Methods on Supervised Classifiers: An Evaluation Study

by Menna Ibrahim Gabr, Yehia Mostafa Helmy and Doaa Saad Elzanfaly

Big Data Cogn. Comput. 2023, 7(1), 55; https://doi.org/10.3390/bdcc7010055 - 22 Mar 2023

Cited by 9 | Viewed by 4138

Abstract

Data completeness is one of the most common challenges that hinder the performance of data analytics platforms. Different studies have assessed the effect of missing values on different classification models based on a single evaluation metric, namely, accuracy. However, accuracy on its own [...] Read more.

Data completeness is one of the most common challenges that hinder the performance of data analytics platforms. Different studies have assessed the effect of missing values on different classification models based on a single evaluation metric, namely, accuracy. However, accuracy on its own is a misleading measure of classifier performance because it does not consider unbalanced datasets. This paper presents an experimental study that assesses the effect of incomplete datasets on the performance of five classification models. The analysis was conducted with different ratios of missing values in six datasets that vary in size, type, and balance. Moreover, for unbiased analysis, the performance of the classifiers was measured using three different metrics, namely, the Matthews correlation coefficient (MCC), the F1-score, and accuracy. The results show that the sensitivity of the supervised classifiers to missing data differs according to a set of factors. The most significant factor is the missing data pattern and ratio, followed by the imputation method, and then the type, size, and balance of the dataset. The sensitivity of the classifiers when data are missing due to the Missing Completely At Random (MCAR) pattern is less than their sensitivity when data are missing due to the Missing Not At Random (MNAR) pattern. Furthermore, using the MCC as an evaluation measure better reflects the variation in the sensitivity of the classifiers to the missing data. Full article

(This article belongs to the Special Issue Machine Learning in Data Mining for Knowledge Discovery)

► Show Figures

Figure 1

13 pages, 2527 KiB

Open AccessArticle

Missing Traffic Data Imputation with a Linear Generative Model Based on Probabilistic Principal Component Analysis

by Liping Huang, Zhenghuan Li, Ruikang Luo and Rong Su

Sensors 2023, 23(1), 204; https://doi.org/10.3390/s23010204 - 25 Dec 2022

Cited by 4 | Viewed by 2083

Abstract

Even with the ubiquitous sensing data in intelligent transportation systems, such as the mobile sensing of vehicle trajectories, traffic estimation is still faced with the data missing problem due to the detector faults or limited number of probe vehicles as mobile sensors. Such [...] Read more.

Even with the ubiquitous sensing data in intelligent transportation systems, such as the mobile sensing of vehicle trajectories, traffic estimation is still faced with the data missing problem due to the detector faults or limited number of probe vehicles as mobile sensors. Such data missing issue poses an obstacle for many further explorations, e.g., the link-based traffic status modeling. Although many studies have focused on tackling this kind of problem, existing studies mainly focus on the situation in which data are missing at random and ignore the distinction between links of missing data. In the practical scenario, traffic speed data are always missing not at random (MNAR). The distinction for recovering missing data on different links has not been studied yet. In this paper, we propose a general linear model based on probabilistic principal component analysis (PPCA) for solving MNAR traffic speed data imputation. Furthermore, we propose a metric, i.e., Pearson score (p-score), for distinguishing links and investigate how the model performs on links with different p-score values. Experimental results show that the new model outperforms the typically used PPCA model, and missing data on links with higher p-score values can be better recovered. Full article

(This article belongs to the Section Vehicular Sensing)

► Show Figures

Figure 1

16 pages, 18210 KiB

Open AccessArticle

Mineralogical Insights to Identify Göktepe Marble in the Sculptural Program of Quinta Das Longas Villa (Lusitania)

by M. Pilar Lapuente Mercadal, Trinidad Nogales-Basarrate and Antonio Carvalho

Minerals 2021, 11(11), 1194; https://doi.org/10.3390/min11111194 - 27 Oct 2021

Cited by 9 | Viewed by 2423

Abstract

This archaeometric study is focused on the marble used in a group of fragmented sculptures found at the Roman villa of Quinta das Longas (Elvas, Portugal). Dating from the 4th century AD, the pieces are of remarkable quality and correspond to ideal and [...] Read more.

This archaeometric study is focused on the marble used in a group of fragmented sculptures found at the Roman villa of Quinta das Longas (Elvas, Portugal). Dating from the 4th century AD, the pieces are of remarkable quality and correspond to ideal and mythological figures from several iconographic cycles. The numerous fragments, all of very fine-grained white marble, are associated with the ornamentation of an impressive nymphaeum of the villa. Their high level of sculpture technique and style, the models followed and their similar typology to other well-known parallels raise the hypothesis of being linked with Aphrodisian workshops. Using a well-established multi-method approach, with Optical microscopy, X-ray Powder Diffraction (XRPD), qualitative and quantitative cathodoluminescence (CL) by CL-Optical and CL-SEM, and stable C and O isotopic and trace element analytical techniques (IRMS and ICP-AES), together with complementary parameters obtained from electron paramagnetic resonance (EPR) and ⁸⁷Sr/⁸⁶Sr isotopes, the marble provenance can be identified with certainty. The results all point to the best quality of white Göktepe marble, confirming the stylistic connection to the ancient Carian sculptors. Full article

(This article belongs to the Special Issue The Role of Minerals in Cultural and Geological Heritage)

► Show Figures

Figure 1

18 pages, 1979 KiB

Open AccessArticle

Multiple Imputation Approaches Applied to the Missing Value Problem in Bottom-Up Proteomics

by Miranda L. Gardner and Michael A. Freitas

Int. J. Mol. Sci. 2021, 22(17), 9650; https://doi.org/10.3390/ijms22179650 - 6 Sep 2021

Cited by 26 | Viewed by 5728

Abstract

Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of [...] Read more.

Analysis of differential abundance in proteomics data sets requires careful application of missing value imputation. Missing abundance values widely vary when performing comparisons across different sample treatments. For example, one would expect a consistent rate of “missing at random” (MAR) across batches of samples and varying rates of “missing not at random” (MNAR) depending on the inherent difference in sample treatments within the study. The missing value imputation strategy must thus be selected that best accounts for both MAR and MNAR simultaneously. Several important issues must be considered when deciding the appropriate missing value imputation strategy: (1) when it is appropriate to impute data; (2) how to choose a method that reflects the combinatorial manner of MAR and MNAR that occurs in an experiment. This paper provides an evaluation of missing value imputation strategies used in proteomics and presents a case for the use of hybrid left-censored missing value imputation approaches that can handle the MNAR problem common to proteomics data. Full article

(This article belongs to the Section Biochemistry)

► Show Figures

Figure 1

16 pages, 635 KiB

Open AccessArticle

Entropy-Based Time Window Features Extraction for Machine Learning to Predict Acute Kidney Injury in ICU

by Chun-Te Huang, Rong-Ching Chang, Yi-Lu Tsai, Kai-Chih Pai, Tsai-Jung Wang, Chia-Tien Hsu, Cheng-Hsu Chen, Chien-Chung Huang, Min-Shian Wang, Lun-Chi Chen, Ruey-Kai Sheu, Chieh-Liang Wu and Chun-Ming Lai

Appl. Sci. 2021, 11(14), 6364; https://doi.org/10.3390/app11146364 - 9 Jul 2021

Cited by 1 | Viewed by 3671

Abstract

Acute kidney injury (AKI) refers to rapid decline of kidney function and is manifested by decreasing urine output or abnormal blood test (elevated serum creatinine). Electronic health records (EHRs) is fundamental for clinicians and machine learning algorithms to predict the clinical outcome of [...] Read more.

Acute kidney injury (AKI) refers to rapid decline of kidney function and is manifested by decreasing urine output or abnormal blood test (elevated serum creatinine). Electronic health records (EHRs) is fundamental for clinicians and machine learning algorithms to predict the clinical outcome of patients in the Intensive Care Unit (ICU). Early prediction of AKI could automatically warn the clinicians to review the possible risk factors and act in advance to prevent it. However, the enormous amount of patient data usually consists of a relatively incomplete data set and is very challenging for supervised machine learning process. In this paper, we propose an entropy-based feature engineering framework for vital signs based on their frequency of records. In particular, we address the missing at random (MAR) and missing not at random (MNAR) types of missing data according to different clinical scenarios. Regarding its applicability, we applied it to establish a prediction model for future AKI in ICU patients using 4278 ICU admissions from a tertiary hospital. Our result shows that the proposed entropy-based features are feasible to be used in the AKI prediction model and its performance improves as the data availability increases. In addition, we study the performance of AKI prediction model by comparing different time gaps and feature windows with the proposed vital sign entropy features. This work could be used as a guidance for feature windows selection and missing data processing during the development of a prediction model in ICU. Full article

(This article belongs to the Special Issue Advanced Machine Learning Algorithms for Biometrics and Its Applications)

► Show Figures

Figure 1

Search Results (21)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (21)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI