Advances in Data Analytics with Applications to Health Care

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Entropy entropy	2.0	5.2	1999	21.8 Days	CHF 2600
Information information	2.9	6.5	2010	18.6 Days	CHF 1800
Data data	2.0	5.0	2016	25.2 Days	CHF 1600
International Journal of Environmental Research and Public Health ijerph	-	8.5	2004	27.8 Days	CHF 2500
Machine Learning and Knowledge Extraction make	6.0	9.9	2019	25.5 Days	CHF 1800

15 pages, 1950 KiB

Open AccessArticle

Drug-Drug Interaction Extraction from Biomedical Text Using Relation BioBERT with BLSTM

by Maryam KafiKang and Abdeltawab Hendawi

Mach. Learn. Knowl. Extr. 2023, 5(2), 669-683; https://doi.org/10.3390/make5020036 - 10 Jun 2023

Cited by 14 | Viewed by 5214

In the context of pharmaceuticals, drug-drug interactions (DDIs) occur when two or more drugs interact, potentially altering the intended effects of the drugs and resulting in adverse patient health outcomes. Therefore, it is essential to identify and comprehend these interactions. In recent years, [...] Read more.

In the context of pharmaceuticals, drug-drug interactions (DDIs) occur when two or more drugs interact, potentially altering the intended effects of the drugs and resulting in adverse patient health outcomes. Therefore, it is essential to identify and comprehend these interactions. In recent years, an increasing number of novel compounds have been discovered, resulting in the discovery of numerous new DDIs. There is a need for effective methods to extract and analyze DDIs, as the majority of this information is still predominantly located in biomedical articles and sources. Despite the development of various techniques, accurately predicting DDIs remains a significant challenge. This paper proposes a novel solution to this problem by leveraging the power of Relation BioBERT (R-BioBERT) to detect and classify DDIs and the Bidirectional Long Short-Term Memory (BLSTM) to improve the accuracy of predictions. In addition to determining whether two drugs interact, the proposed method also identifies the specific types of interactions between them. Results show that the use of BLSTM leads to significantly higher F-scores compared to our baseline model, as demonstrated on three well-known DDI extraction datasets that includes SemEval 2013, TAC 2018, and TAC 2019. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

9 pages, 1849 KiB

Open AccessData Descriptor

The Effect of Short-Term Transcutaneous Electrical Stimulation of Auricular Vagus Nerve on Parameters of Heart Rate Variability

by Vladimir Shvartz, Eldar Sizhazhev, Maria Sokolskaya, Svetlana Koroleva, Soslan Enginoev, Sofia Kruchinova, Elena Shvartz and Elena Golukhova

Data 2023, 8(5), 87; https://doi.org/10.3390/data8050087 - 11 May 2023

Cited by 2 | Viewed by 5161

Abstract

Many previous studies have demonstrated that transcutaneous vagus nerve stimulation (VNS) has the potential to exhibit therapeutic effects similar to its invasive counterpart. An objective assessment of VNS requires a reliable biomarker of successful vagal activation. Although many potential biomarkers have been proposed, [...] Read more.

Many previous studies have demonstrated that transcutaneous vagus nerve stimulation (VNS) has the potential to exhibit therapeutic effects similar to its invasive counterpart. An objective assessment of VNS requires a reliable biomarker of successful vagal activation. Although many potential biomarkers have been proposed, most studies have focused on heart rate variability (HRV). Despite the physiological rationale for HRV as a biomarker for assessing vagal stimulation, data on its effects on HRV are equivocal. To further advance this field, future studies investigating VNS should contain adequate methodological specifics that make it possible to compare the results between studies, to replicate studies, and to enhance the safety of study participants. This article describes the design and methodology of a randomized study evaluating the effect of short-term noninvasive stimulation of the auricular branch of the vagus nerve on parameters of HRV. Primary records of rhythmograms of all the subjects, as well as a dataset with clinical, instrumental, and laboratory data of all the current study subjects are in the public domain for possible secondary analysis to all interested researchers. The physiological interpretation of the obtained data is not considered in the article. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Graphical abstract

9 pages, 5038 KiB

Open AccessEditor’s ChoiceData Descriptor

A Tumour and Liver Automatic Segmentation (ATLAS) Dataset on Contrast-Enhanced Magnetic Resonance Imaging for Hepatocellular Carcinoma

by Félix Quinton, Romain Popoff, Benoît Presles, Sarah Leclerc, Fabrice Meriaudeau, Guillaume Nodari, Olivier Lopez, Julie Pellegrinelli, Olivier Chevallier, Dominique Ginhac, Jean-Marc Vrigneaud and Jean-Louis Alberini

Data 2023, 8(5), 79; https://doi.org/10.3390/data8050079 - 27 Apr 2023

Cited by 37 | Viewed by 9508

Abstract

Liver cancer is the sixth most common cancer in the world and the fourth leading cause of cancer mortality. In unresectable liver cancers, especially hepatocellular carcinoma (HCC), transarterial radioembolisation (TARE) can be considered for treatment. TARE treatment involves a contrast-enhanced magnetic resonance imaging [...] Read more.

Liver cancer is the sixth most common cancer in the world and the fourth leading cause of cancer mortality. In unresectable liver cancers, especially hepatocellular carcinoma (HCC), transarterial radioembolisation (TARE) can be considered for treatment. TARE treatment involves a contrast-enhanced magnetic resonance imaging (CE-MRI) exam performed beforehand to delineate the liver and tumour(s) in order to perform dosimetry calculation. Due to the significant amount of time and expertise required to perform the delineation process, there is a strong need for automation. Unfortunately, the lack of publicly available CE-MRI datasets with liver tumour annotations has hindered the development of fully automatic solutions for liver and tumour segmentation. The “Tumour and Liver Automatic Segmentation” (ATLAS) dataset that we present consists of 90 liver-focused CE-MRI covering the entire liver of 90 patients with unresectable HCC, along with 90 liver and liver tumour segmentation masks. To the best of our knowledge, the ATLAS dataset is the first public dataset providing CE-MRI of HCC with annotations. The public availability of this dataset should greatly facilitate the development of automated tools designed to optimise the delineation process, which is essential for treatment planning in liver cancer patients. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

16 pages, 5839 KiB

Open AccessArticle

A Diabetes Prediction System Based on Incomplete Fused Data Sources

by Zhaoyi Yuan, Hao Ding, Guoqing Chao, Mingqiang Song, Lei Wang, Weiping Ding and Dianhui Chu

Mach. Learn. Knowl. Extr. 2023, 5(2), 384-399; https://doi.org/10.3390/make5020023 - 10 Apr 2023

Cited by 7 | Viewed by 3180

Abstract

In recent years, the diabetes population has grown younger. Therefore, it has become a key problem to make a timely and effective prediction of diabetes, especially given a single data source. Meanwhile, there are many data sources of diabetes patients collected around the [...] Read more.

In recent years, the diabetes population has grown younger. Therefore, it has become a key problem to make a timely and effective prediction of diabetes, especially given a single data source. Meanwhile, there are many data sources of diabetes patients collected around the world, and it is extremely important to integrate these heterogeneous data sources to accurately predict diabetes. For the different data sources used to predict diabetes, the predictors may be different. In other words, some special features exist only in certain data sources, which leads to the problem of missing values. Considering the uncertainty of the missing values within the fused dataset, multiple imputation and a method based on graph representation is used to impute the missing values within the fused dataset. The logistic regression model and stacking strategy are applied for diabetes training and prediction on the fused dataset. It is proved that the idea of combining heterogeneous datasets and imputing the missing values produced in the fusion process can effectively improve the performance of diabetes prediction. In addition, the proposed diabetes prediction method can be further extended to any scenarios where heterogeneous datasets with the same label types and different feature attributes exist. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

17 pages, 2439 KiB

Open AccessArticle

Information Entropy Measures for Evaluation of Reliability of Deep Neural Network Results

by Elakkat D. Gireesh and Varadaraj P. Gurupur

Entropy 2023, 25(4), 573; https://doi.org/10.3390/e25040573 - 27 Mar 2023

Cited by 2 | Viewed by 3113

Abstract

Deep neural networks (DNN) try to analyze given data, to come up with decisions regarding the inputs. The decision-making process of the DNN model is not entirely transparent. The confidence of the model predictions on new data fed into the network can vary. [...] Read more.

Deep neural networks (DNN) try to analyze given data, to come up with decisions regarding the inputs. The decision-making process of the DNN model is not entirely transparent. The confidence of the model predictions on new data fed into the network can vary. We address the question of certainty of decision making and adequacy of information capturing by DNN models during this process of decision-making. We introduce a measure called certainty index, which is based on the outputs in the most penultimate layer of DNN. In this approach, we employed iEEG (intracranial electroencephalogram) data to train and test DNN. When arriving at model predictions, the contribution of the entire information content of the input may be important. We explored the relationship between the certainty of DNN predictions and information content of the signal by estimating the sample entropy and using a heatmap of the signal. While it can be assumed that the entire sample must be utilized for arriving at the most appropriate decisions, an evaluation of DNNs from this standpoint has not been reported. We demonstrate that the robustness of the relationship between certainty index with the sample entropy, demonstrated through sample entropy-heatmap correlation, is higher than that with the original signal, indicating that the DNN focuses on information rich regions of the signal to arrive at decisions. Therefore, it can be concluded that the certainty of a decision is related to the DNN’s ability to capture the information in the original signal. Our results indicate that, within its limitations, the certainty index can be used as useful tool in estimating the confidence of predictions. The certainty index appears to be related to how effectively DNN heatmaps captured the information content in the signal. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

14 pages, 4736 KiB

Open AccessArticle

Morphological and Morphometric Analysis of Canine Choroidal Layers Using Spectral Domain Optical Coherence Tomography

by Jowita Zwolska, Ireneusz Balicki and Agnieszka Balicka

Int. J. Environ. Res. Public Health 2023, 20(4), 3121; https://doi.org/10.3390/ijerph20043121 - 10 Feb 2023

Cited by 2 | Viewed by 2294

Abstract

The choroid, a multifunctional tissue, has been the focus of research interest for many scientists. Its morphology and morphometry facilitate an understanding of pathological processes within both the choroid and retina. This study aimed to determine the choroidal layer thicknesses in healthy, mixed-breed [...] Read more.

The choroid, a multifunctional tissue, has been the focus of research interest for many scientists. Its morphology and morphometry facilitate an understanding of pathological processes within both the choroid and retina. This study aimed to determine the choroidal layer thicknesses in healthy, mixed-breed mesocephalic dogs, both male (M) and female (F), using spectral domain optical coherence tomography (SD-OCT) with radial, cross-sectional, and linear scans. The dogs were divided into two groups based on age: middle-aged (MA) and senior (SN). Thicknesses of choroidal layers, namely RPE–Bruch’s membrane–choriocapillaris complex (RPE-BmCc) with tapetum lucidum in the tapetal fundus, the medium-sized vessel layer (MSVL), and the large vessel layer with lamina suprachoroidea (LVLS), as well as whole choroidal thickness (WCT), were measured manually using the caliper function integrated into the OCT software. Measurement was performed dorsally and ventrally at a distance of 5000–6000 μm temporally and nasally at a distance of 4000–7000 μm to the optic disc on enhanced depth scans. The measurements were conducted temporally and nasally in both the tapetal (temporal tapetal: TempT, nasal tapetal: NasT) and nontapetal (temporal nontapetal: TempNT, nasal nontapetal: NasNT) fundus. The ratio of the MSVL thickness to the LVLS thickness for each region was calculated. In all examined dogs, the RPE-BmCc in the dorsal (D) region and MSVL in the Tt region were significantly thicker than those in the other regions. The MSVL was thinner in the ventral (V) region than in the D, TempT, TempNT and NasT regions. The MSVL was significantly thinner in the NasNT region than in the D region. LVLS thickness and WCT were significantly greater in the D and TempT regions than those in the other regions and significantly lesser in the V region than those in the other regions. The MSVL-to-LVLS thickness ratio did not differ between the age groups. Our results reveal that the choroidal thickness profile does not depend on age. Our findings can be used to document the emergence and development of various choroidal diseases in dogs in the future. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

11 pages, 886 KiB

Open AccessArticle

Does Advanced Maternal Age Comprise an Independent Risk Factor for Caesarean Section? A Population-Wide Study

by Anna Šťastná, Tomáš Fait, Jiřina Kocourková and Eva Waldaufová

Int. J. Environ. Res. Public Health 2023, 20(1), 668; https://doi.org/10.3390/ijerph20010668 - 30 Dec 2022

Cited by 7 | Viewed by 3402

Abstract

Objective: To investigate the association between a mother’s age and the risk of caesarean section (CS) when controlling for health factors and selected sociodemographic characteristics. Methods: Binary logistic regression models for all women who gave birth in Czechia in 2018 (N = 111,749 [...] Read more.

Objective: To investigate the association between a mother’s age and the risk of caesarean section (CS) when controlling for health factors and selected sociodemographic characteristics. Methods: Binary logistic regression models for all women who gave birth in Czechia in 2018 (N = 111,749 mothers who gave birth to 113,234 children). Results: An increase in the age of a mother significantly increases the odds of a CS birth according to all of the models; depending on the model, OR: 1.62 (95% CI 1.54–1.71) to 1.84 (95% CI 1.70–1.99) for age group 35–39 and OR: 2.83 (95% CI 2.60–3.08) to 3.71 (95% CI 3.23–4.27) for age group 40+ compared to age group 25–29. This strong association between the age of a mother and the risk of CS is further reinforced for primiparas (probability of a CS: 11% for age category ≤ 19, 23% for age category 35–39, and 38% for age category 40+). However, the increasing educational attainment of young women appears to have weakened the influence of increasing maternal age on the overall share of CS births; depending on the model, OR: 0.86 (95% CI 0.80–0.91) to 0.87 (95% CI 0.83–0.91) for tertiary-educated compared to secondary-educated women. Conclusions: The age of a mother comprises an independent risk factor for a CS birth when the influence of health, socioeconomic, and demographic characteristics is considered. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

26 pages, 4206 KiB

Open AccessArticle

Penalty and Shrinkage Strategies Based on Local Polynomials for Right-Censored Partially Linear Regression

by Syed Ejaz Ahmed, Dursun Aydın and Ersin Yılmaz

Entropy 2022, 24(12), 1833; https://doi.org/10.3390/e24121833 - 15 Dec 2022

Cited by 2 | Viewed by 2447

Abstract

This study aims to propose modified semiparametric estimators based on six different penalty and shrinkage strategies for the estimation of a right-censored semiparametric regression model. In this context, the methods used to obtain the estimators are ridge, lasso, adaptive lasso, SCAD, MCP, and [...] Read more.

This study aims to propose modified semiparametric estimators based on six different penalty and shrinkage strategies for the estimation of a right-censored semiparametric regression model. In this context, the methods used to obtain the estimators are ridge, lasso, adaptive lasso, SCAD, MCP, and elasticnet penalty functions. The most important contribution that distinguishes this article from its peers is that it uses the local polynomial method as a smoothing method. The theoretical estimation procedures for the obtained estimators are explained. In addition, a simulation study is performed to see the behavior of the estimators and make a detailed comparison, and hepatocellular carcinoma data are estimated as a real data example. As a result of the study, the estimators based on adaptive lasso and SCAD were more resistant to censorship and outperformed the other four estimators. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

22 pages, 9306 KiB

Open AccessArticle

In Vitro Major Arterial Cardiovascular Simulator to Generate Benchmark Data Sets for In Silico Model Validation

by Michelle Wisotzki, Alexander Mair, Paul Schlett, Bernhard Lindner, Max Oberhardt and Stefan Bernhard

Data 2022, 7(11), 145; https://doi.org/10.3390/data7110145 - 27 Oct 2022

Cited by 5 | Viewed by 2859

Abstract

Cardiovascular diseases are commonly caused by atherosclerosis, stenosis and aneurysms. Understanding the influence of these pathological conditions on the circulatory mechanism is required to establish methods for early diagnosis. Different tools have been developed to simulate healthy and pathological conditions of blood flow. [...] Read more.

Cardiovascular diseases are commonly caused by atherosclerosis, stenosis and aneurysms. Understanding the influence of these pathological conditions on the circulatory mechanism is required to establish methods for early diagnosis. Different tools have been developed to simulate healthy and pathological conditions of blood flow. These simulations are often based on computational models that allow the generation of large data sets for further investigation. However, because computational models often lack some aspects of real-world data, hardware simulators are used to close this gap and generate data for model validation. The aim of this study is to develop and validate a hardware simulator to generate benchmark data sets of healthy and pathological conditions. The development process was led by specific design criteria to allow flexible and physiological simulations. The in vitro hardware simulator includes the major 33 arteries and is driven by a ventricular assist device generating a parametrised in-flow condition at the heart node. Physiologic flow conditions, including heart rate, systolic/diastolic pressure, peripheral resistance and compliance, are adjustable across a wide range. The pressure and flow waves at 17 + 1 locations are measured by inverted fluid-resistant pressure transducers and one ultrasound flow transducer, supporting a detailed analysis of the measurement data even for in silico modelling applications. The pressure and flow waves are compared to in vivo measurements and show physiological conditions. The influence of the degree and location of the stenoses on blood pressure and flow was also investigated. The results indicate decreasing translesional pressure and flow with an increasing degree of stenosis, as expected. The benchmark data set is made available to the research community for validating and comparing different types of computational models. It is hoped that the validation and improvement of computational simulation models will provide better clinical predictions. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

12 pages, 2426 KiB

Open AccessArticle

Predictive Modeling for the Diagnosis of Gestational Diabetes Mellitus Using Epidemiological Data in the United Arab Emirates

by Nasloon Ali, Wasif Khan, Amir Ahmad, Mohammad Mehedy Masud, Hiba Adam and Luai A. Ahmed

Information 2022, 13(10), 485; https://doi.org/10.3390/info13100485 - 10 Oct 2022

Cited by 5 | Viewed by 3316

Abstract

Gestational diabetes mellitus (GDM) is a common condition with repercussions for both the mother and her child. Machine learning (ML) modeling techniques were proposed to predict the risk of several medical outcomes. A systematic evaluation of the predictive capacity of maternal factors resulting [...] Read more.

Gestational diabetes mellitus (GDM) is a common condition with repercussions for both the mother and her child. Machine learning (ML) modeling techniques were proposed to predict the risk of several medical outcomes. A systematic evaluation of the predictive capacity of maternal factors resulting in GDM in the UAE is warranted. Data on a total of 3858 women who gave birth and had information on their GDM status in a birth cohort were used to fit the GDM risk prediction model. Information used for the predictive modeling were from self-reported epidemiological data collected at early gestation. Three different ML models, random forest (RF), gradient boosting model (GBM), and extreme gradient boosting (XGBoost), were used to predict GDM. Furthermore, to provide local interpretation of each feature in GDM diagnosis, features were studied using Shapley additive explanations (SHAP). Results obtained using ML models show that XGBoost, which achieved an AUC of 0.77, performed better compared to RF and GBM. Individual feature importance using SHAP value and the XGBoost model show that previous GDM diagnosis, maternal age, body mass index, and gravidity play a vital role in GDM diagnosis. ML models using self-reported epidemiological data are useful and feasible in prediction models for GDM diagnosis amongst pregnant women. Such data should be periodically collected at early pregnancy for health professionals to intervene at earlier stages to prevent adverse outcomes in pregnancy and delivery. The XGBoost algorithm was the optimal model for identifying the features that predict GDM diagnosis. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

8 pages, 1351 KiB

Open AccessData Descriptor

Full-Body Mobility Data to Validate Inertial Measurement Unit Algorithms in Healthy and Neurological Cohorts

by Elke Warmerdam, Clint Hansen, Robbin Romijnders, Markus A. Hobert, Julius Welzel and Walter Maetzler

Data 2022, 7(10), 136; https://doi.org/10.3390/data7100136 - 27 Sep 2022

Cited by 14 | Viewed by 4194

Abstract

Gait and balance dysfunctions are common in neurological disorders and have a negative effect on quality of life. Regularly quantifying these mobility limitations can be used to measure disease progression and the effect of treatment. This information can be used to provide a [...] Read more.

Gait and balance dysfunctions are common in neurological disorders and have a negative effect on quality of life. Regularly quantifying these mobility limitations can be used to measure disease progression and the effect of treatment. This information can be used to provide a more individualized treatment. Inertial measurement units (IMUs) can be utilized to quantify mobility in different contexts. However, algorithms are required to extract valuable parameters out of the raw IMU data. These algorithms need to be validated to make sure that they extract the features they should extract. This validation should be performed per disease since different mobility limitations or symptoms can influence the performance of an algorithm in different ways. Therefore, this dataset contains data from both healthy subjects and patients with neurological diseases (Parkinson’s disease, stroke, multiple sclerosis, chronic low back pain). The full bodies of 167 subjects were measured with IMUs and an optical motion capture (reference) system. Subjects performed multiple standardized mobility assessments and non-standardized activities of daily living. The data of 21 healthy subjects are shared online, data of the other subjects and patients can only be obtained after contacting the corresponding author and signing a data sharing agreement. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

28 pages, 3820 KiB

Open AccessArticle

Supervised Learning Models for the Preliminary Detection of COVID-19 in Patients Using Demographic and Epidemiological Parameters

by Aditya Pradhan, Srikanth Prabhu, Krishnaraj Chadaga, Saptarshi Sengupta and Gopal Nath

Information 2022, 13(7), 330; https://doi.org/10.3390/info13070330 - 10 Jul 2022

Cited by 27 | Viewed by 4868

Abstract

The World Health Organization labelled the new COVID-19 breakout a public health crisis of worldwide concern on 30 January 2020, and it was named the new global pandemic in March 2020. It has had catastrophic consequences on the world economy and well-being of [...] Read more.

The World Health Organization labelled the new COVID-19 breakout a public health crisis of worldwide concern on 30 January 2020, and it was named the new global pandemic in March 2020. It has had catastrophic consequences on the world economy and well-being of people and has put a tremendous strain on already-scarce healthcare systems globally, particularly in underdeveloped countries. Over 11 billion vaccine doses have already been administered worldwide, and the benefits of these vaccinations will take some time to appear. Today, the only practical approach to diagnosing COVID-19 is through the RT-PCR and RAT tests, which have sometimes been known to give unreliable results. Timely diagnosis and implementation of precautionary measures will likely improve the survival outcome and decrease the fatality rates. In this study, we propose an innovative way to predict COVID-19 with the help of alternative non-clinical methods such as supervised machine learning models to identify the patients at risk based on their characteristic parameters and underlying comorbidities. Medical records of patients from Mexico admitted between 23 January 2020 and 26 March 2022, were chosen for this purpose. Among several supervised machine learning approaches tested, the XGBoost model achieved the best results with an accuracy of 92%. It is an easy, non-invasive, inexpensive, instant and accurate way of forecasting those at risk of contracting the virus. However, it is pretty early to deduce that this method can be used as an alternative in the clinical diagnosis of coronavirus cases. Full article

(This article belongs to the Topic Advances in Data Analytics with Applications to Health Care)

► Show Figures

Figure 1

Topic Menu

Topic Editors

Advances in Data Analytics with Applications to Health Care

Topic Information

Participating Journals

Published Papers (12 papers)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI