Topic Editors

Department of Mathematics and Statistics, Brock University, St. Catharines, ON L2S 3A1, Canada
Prof. Dr. Abdulkadir Hussein
Department of Mathematics and Statistics, University of Windsor, Windsor, Ontario, Canada
Dr. Abbas Khalili
McGill Univ, Dept Math & Stat, 805 Sherbrooke St West, Montreal, PQ H3A 0B9, Canada

Advances in Data Analytics with Applications to Health Care

Abstract submission deadline
closed (31 December 2022)
Manuscript submission deadline
closed (31 March 2023)
Viewed by
28973

Topic Information

Dear Colleagues,

As we embark on the second quarter of the 21st century, the landscape of health care data as well as relevant analytics tools are rapidly changing. In the early years of statistical learning methodologies, data sets were limited to relatively small samples, and randomized clinical trials were the gold standard in knowledge extraction. Administrative data sets were often dismissed by statisticians as observational data with a plenty of biases. With the availability of large administrative data sets (streamed from all sorts of channels such as Electronic Medical Records (EMR), wearable sensors, disease control and public health organizations, etc.), the data analytics communities (statisticians, computer scientists, epidemiologists, etc.) were forced to develop new tools for the analysis of such data sets and meaningful information to the public and to the concerned decision makers. These advances have touched all areas of the health care data analytics, from mimicking clinical trials via causal inferences to modifying random forests and neural networks to accommodate data with special features such as censored event history data and going through models that describe an entire distribution such as quantile regressions. Applications of these new advanced methodologies have benefited a variety of health care domains leading to, for instance, better personalized health care (e.g., personalized medicine), better telemedicine and automated remote diagnoses, better predictions of supply chain and product demand in health care (specially during pandemics) and many more.

This Topic is an attempt to provide a glimpse of these advanced tools, especially during the past two years, in which a global pandemic presented an opportunity to learn more about the deficiencies of the existing data analytics tools. The topics to be covered include but are not limited to the following:

  • New and advanced statistical learning (SL) tools for the analysis of health care data;
  • New and advanced machine learning (ML) tools for the analysis of health care data;
  • Novel ways of applying existing SL and/or ML tools to health care data.

Prof. Dr. S. Ejaz Ahmed
Prof. Dr. Abdulkadir Hussein
Dr. Abbas Khalili
Topic Editors

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Entropy
entropy
2.7 4.7 1999 20.8 Days CHF 2600
Information
information
3.1 5.8 2010 18 Days CHF 1600
Data
data
2.6 4.6 2016 22 Days CHF 1600
International Journal of Environmental Research and Public Health
ijerph
- 5.4 2004 29.6 Days CHF 2500
Machine Learning and Knowledge Extraction
make
3.9 8.5 2019 19.9 Days CHF 1800

Preprints.org is a multidiscipline platform providing preprint service that is dedicated to sharing your research from the start and empowering your research journey.

MDPI Topics is cooperating with Preprints.org and has built a direct connection between MDPI journals and Preprints.org. Authors are encouraged to enjoy the benefits by posting a preprint at Preprints.org prior to publication:

  1. Immediately share your ideas ahead of publication and establish your research priority;
  2. Protect your idea from being stolen with this time-stamped preprint article;
  3. Enhance the exposure and impact of your research;
  4. Receive feedback from your peers in advance;
  5. Have it indexed in Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (12 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
15 pages, 1950 KiB  
Article
Drug-Drug Interaction Extraction from Biomedical Text Using Relation BioBERT with BLSTM
by Maryam KafiKang and Abdeltawab Hendawi
Mach. Learn. Knowl. Extr. 2023, 5(2), 669-683; https://doi.org/10.3390/make5020036 - 10 Jun 2023
Cited by 3 | Viewed by 2268
Abstract
In the context of pharmaceuticals, drug-drug interactions (DDIs) occur when two or more drugs interact, potentially altering the intended effects of the drugs and resulting in adverse patient health outcomes. Therefore, it is essential to identify and comprehend these interactions. In recent years, [...] Read more.
In the context of pharmaceuticals, drug-drug interactions (DDIs) occur when two or more drugs interact, potentially altering the intended effects of the drugs and resulting in adverse patient health outcomes. Therefore, it is essential to identify and comprehend these interactions. In recent years, an increasing number of novel compounds have been discovered, resulting in the discovery of numerous new DDIs. There is a need for effective methods to extract and analyze DDIs, as the majority of this information is still predominantly located in biomedical articles and sources. Despite the development of various techniques, accurately predicting DDIs remains a significant challenge. This paper proposes a novel solution to this problem by leveraging the power of Relation BioBERT (R-BioBERT) to detect and classify DDIs and the Bidirectional Long Short-Term Memory (BLSTM) to improve the accuracy of predictions. In addition to determining whether two drugs interact, the proposed method also identifies the specific types of interactions between them. Results show that the use of BLSTM leads to significantly higher F-scores compared to our baseline model, as demonstrated on three well-known DDI extraction datasets that includes SemEval 2013, TAC 2018, and TAC 2019. Full article
Show Figures

Figure 1

9 pages, 1849 KiB  
Data Descriptor
The Effect of Short-Term Transcutaneous Electrical Stimulation of Auricular Vagus Nerve on Parameters of Heart Rate Variability
by Vladimir Shvartz, Eldar Sizhazhev, Maria Sokolskaya, Svetlana Koroleva, Soslan Enginoev, Sofia Kruchinova, Elena Shvartz and Elena Golukhova
Data 2023, 8(5), 87; https://doi.org/10.3390/data8050087 - 11 May 2023
Viewed by 2688
Abstract
Many previous studies have demonstrated that transcutaneous vagus nerve stimulation (VNS) has the potential to exhibit therapeutic effects similar to its invasive counterpart. An objective assessment of VNS requires a reliable biomarker of successful vagal activation. Although many potential biomarkers have been proposed, [...] Read more.
Many previous studies have demonstrated that transcutaneous vagus nerve stimulation (VNS) has the potential to exhibit therapeutic effects similar to its invasive counterpart. An objective assessment of VNS requires a reliable biomarker of successful vagal activation. Although many potential biomarkers have been proposed, most studies have focused on heart rate variability (HRV). Despite the physiological rationale for HRV as a biomarker for assessing vagal stimulation, data on its effects on HRV are equivocal. To further advance this field, future studies investigating VNS should contain adequate methodological specifics that make it possible to compare the results between studies, to replicate studies, and to enhance the safety of study participants. This article describes the design and methodology of a randomized study evaluating the effect of short-term noninvasive stimulation of the auricular branch of the vagus nerve on parameters of HRV. Primary records of rhythmograms of all the subjects, as well as a dataset with clinical, instrumental, and laboratory data of all the current study subjects are in the public domain for possible secondary analysis to all interested researchers. The physiological interpretation of the obtained data is not considered in the article. Full article
Show Figures

Graphical abstract

9 pages, 5038 KiB  
Data Descriptor
A Tumour and Liver Automatic Segmentation (ATLAS) Dataset on Contrast-Enhanced Magnetic Resonance Imaging for Hepatocellular Carcinoma
by Félix Quinton, Romain Popoff, Benoît Presles, Sarah Leclerc, Fabrice Meriaudeau, Guillaume Nodari, Olivier Lopez, Julie Pellegrinelli, Olivier Chevallier, Dominique Ginhac, Jean-Marc Vrigneaud and Jean-Louis Alberini
Data 2023, 8(5), 79; https://doi.org/10.3390/data8050079 - 27 Apr 2023
Cited by 3 | Viewed by 3879
Abstract
Liver cancer is the sixth most common cancer in the world and the fourth leading cause of cancer mortality. In unresectable liver cancers, especially hepatocellular carcinoma (HCC), transarterial radioembolisation (TARE) can be considered for treatment. TARE treatment involves a contrast-enhanced magnetic resonance imaging [...] Read more.
Liver cancer is the sixth most common cancer in the world and the fourth leading cause of cancer mortality. In unresectable liver cancers, especially hepatocellular carcinoma (HCC), transarterial radioembolisation (TARE) can be considered for treatment. TARE treatment involves a contrast-enhanced magnetic resonance imaging (CE-MRI) exam performed beforehand to delineate the liver and tumour(s) in order to perform dosimetry calculation. Due to the significant amount of time and expertise required to perform the delineation process, there is a strong need for automation. Unfortunately, the lack of publicly available CE-MRI datasets with liver tumour annotations has hindered the development of fully automatic solutions for liver and tumour segmentation. The “Tumour and Liver Automatic Segmentation” (ATLAS) dataset that we present consists of 90 liver-focused CE-MRI covering the entire liver of 90 patients with unresectable HCC, along with 90 liver and liver tumour segmentation masks. To the best of our knowledge, the ATLAS dataset is the first public dataset providing CE-MRI of HCC with annotations. The public availability of this dataset should greatly facilitate the development of automated tools designed to optimise the delineation process, which is essential for treatment planning in liver cancer patients. Full article
Show Figures

Figure 1

16 pages, 5839 KiB  
Article
A Diabetes Prediction System Based on Incomplete Fused Data Sources
by Zhaoyi Yuan, Hao Ding, Guoqing Chao, Mingqiang Song, Lei Wang, Weiping Ding and Dianhui Chu
Mach. Learn. Knowl. Extr. 2023, 5(2), 384-399; https://doi.org/10.3390/make5020023 - 10 Apr 2023
Cited by 1 | Viewed by 2087
Abstract
In recent years, the diabetes population has grown younger. Therefore, it has become a key problem to make a timely and effective prediction of diabetes, especially given a single data source. Meanwhile, there are many data sources of diabetes patients collected around the [...] Read more.
In recent years, the diabetes population has grown younger. Therefore, it has become a key problem to make a timely and effective prediction of diabetes, especially given a single data source. Meanwhile, there are many data sources of diabetes patients collected around the world, and it is extremely important to integrate these heterogeneous data sources to accurately predict diabetes. For the different data sources used to predict diabetes, the predictors may be different. In other words, some special features exist only in certain data sources, which leads to the problem of missing values. Considering the uncertainty of the missing values within the fused dataset, multiple imputation and a method based on graph representation is used to impute the missing values within the fused dataset. The logistic regression model and stacking strategy are applied for diabetes training and prediction on the fused dataset. It is proved that the idea of combining heterogeneous datasets and imputing the missing values produced in the fusion process can effectively improve the performance of diabetes prediction. In addition, the proposed diabetes prediction method can be further extended to any scenarios where heterogeneous datasets with the same label types and different feature attributes exist. Full article
Show Figures

Figure 1

17 pages, 2439 KiB  
Article
Information Entropy Measures for Evaluation of Reliability of Deep Neural Network Results
by Elakkat D. Gireesh and Varadaraj P. Gurupur
Entropy 2023, 25(4), 573; https://doi.org/10.3390/e25040573 - 27 Mar 2023
Cited by 1 | Viewed by 1823
Abstract
Deep neural networks (DNN) try to analyze given data, to come up with decisions regarding the inputs. The decision-making process of the DNN model is not entirely transparent. The confidence of the model predictions on new data fed into the network can vary. [...] Read more.
Deep neural networks (DNN) try to analyze given data, to come up with decisions regarding the inputs. The decision-making process of the DNN model is not entirely transparent. The confidence of the model predictions on new data fed into the network can vary. We address the question of certainty of decision making and adequacy of information capturing by DNN models during this process of decision-making. We introduce a measure called certainty index, which is based on the outputs in the most penultimate layer of DNN. In this approach, we employed iEEG (intracranial electroencephalogram) data to train and test DNN. When arriving at model predictions, the contribution of the entire information content of the input may be important. We explored the relationship between the certainty of DNN predictions and information content of the signal by estimating the sample entropy and using a heatmap of the signal. While it can be assumed that the entire sample must be utilized for arriving at the most appropriate decisions, an evaluation of DNNs from this standpoint has not been reported. We demonstrate that the robustness of the relationship between certainty index with the sample entropy, demonstrated through sample entropy-heatmap correlation, is higher than that with the original signal, indicating that the DNN focuses on information rich regions of the signal to arrive at decisions. Therefore, it can be concluded that the certainty of a decision is related to the DNN’s ability to capture the information in the original signal. Our results indicate that, within its limitations, the certainty index can be used as useful tool in estimating the confidence of predictions. The certainty index appears to be related to how effectively DNN heatmaps captured the information content in the signal. Full article
Show Figures

Figure 1

14 pages, 4736 KiB  
Article
Morphological and Morphometric Analysis of Canine Choroidal Layers Using Spectral Domain Optical Coherence Tomography
by Jowita Zwolska, Ireneusz Balicki and Agnieszka Balicka
Int. J. Environ. Res. Public Health 2023, 20(4), 3121; https://doi.org/10.3390/ijerph20043121 - 10 Feb 2023
Viewed by 1367
Abstract
The choroid, a multifunctional tissue, has been the focus of research interest for many scientists. Its morphology and morphometry facilitate an understanding of pathological processes within both the choroid and retina. This study aimed to determine the choroidal layer thicknesses in healthy, mixed-breed [...] Read more.
The choroid, a multifunctional tissue, has been the focus of research interest for many scientists. Its morphology and morphometry facilitate an understanding of pathological processes within both the choroid and retina. This study aimed to determine the choroidal layer thicknesses in healthy, mixed-breed mesocephalic dogs, both male (M) and female (F), using spectral domain optical coherence tomography (SD-OCT) with radial, cross-sectional, and linear scans. The dogs were divided into two groups based on age: middle-aged (MA) and senior (SN). Thicknesses of choroidal layers, namely RPE–Bruch’s membrane–choriocapillaris complex (RPE-BmCc) with tapetum lucidum in the tapetal fundus, the medium-sized vessel layer (MSVL), and the large vessel layer with lamina suprachoroidea (LVLS), as well as whole choroidal thickness (WCT), were measured manually using the caliper function integrated into the OCT software. Measurement was performed dorsally and ventrally at a distance of 5000–6000 μm temporally and nasally at a distance of 4000–7000 μm to the optic disc on enhanced depth scans. The measurements were conducted temporally and nasally in both the tapetal (temporal tapetal: TempT, nasal tapetal: NasT) and nontapetal (temporal nontapetal: TempNT, nasal nontapetal: NasNT) fundus. The ratio of the MSVL thickness to the LVLS thickness for each region was calculated. In all examined dogs, the RPE-BmCc in the dorsal (D) region and MSVL in the Tt region were significantly thicker than those in the other regions. The MSVL was thinner in the ventral (V) region than in the D, TempT, TempNT and NasT regions. The MSVL was significantly thinner in the NasNT region than in the D region. LVLS thickness and WCT were significantly greater in the D and TempT regions than those in the other regions and significantly lesser in the V region than those in the other regions. The MSVL-to-LVLS thickness ratio did not differ between the age groups. Our results reveal that the choroidal thickness profile does not depend on age. Our findings can be used to document the emergence and development of various choroidal diseases in dogs in the future. Full article
Show Figures

Figure 1

11 pages, 886 KiB  
Article
Does Advanced Maternal Age Comprise an Independent Risk Factor for Caesarean Section? A Population-Wide Study
by Anna Šťastná, Tomáš Fait, Jiřina Kocourková and Eva Waldaufová
Int. J. Environ. Res. Public Health 2023, 20(1), 668; https://doi.org/10.3390/ijerph20010668 - 30 Dec 2022
Cited by 4 | Viewed by 1848
Abstract
Objective: To investigate the association between a mother’s age and the risk of caesarean section (CS) when controlling for health factors and selected sociodemographic characteristics. Methods: Binary logistic regression models for all women who gave birth in Czechia in 2018 (N = 111,749 [...] Read more.
Objective: To investigate the association between a mother’s age and the risk of caesarean section (CS) when controlling for health factors and selected sociodemographic characteristics. Methods: Binary logistic regression models for all women who gave birth in Czechia in 2018 (N = 111,749 mothers who gave birth to 113,234 children). Results: An increase in the age of a mother significantly increases the odds of a CS birth according to all of the models; depending on the model, OR: 1.62 (95% CI 1.54–1.71) to 1.84 (95% CI 1.70–1.99) for age group 35–39 and OR: 2.83 (95% CI 2.60–3.08) to 3.71 (95% CI 3.23–4.27) for age group 40+ compared to age group 25–29. This strong association between the age of a mother and the risk of CS is further reinforced for primiparas (probability of a CS: 11% for age category ≤ 19, 23% for age category 35–39, and 38% for age category 40+). However, the increasing educational attainment of young women appears to have weakened the influence of increasing maternal age on the overall share of CS births; depending on the model, OR: 0.86 (95% CI 0.80–0.91) to 0.87 (95% CI 0.83–0.91) for tertiary-educated compared to secondary-educated women. Conclusions: The age of a mother comprises an independent risk factor for a CS birth when the influence of health, socioeconomic, and demographic characteristics is considered. Full article
Show Figures

Figure 1

26 pages, 4206 KiB  
Article
Penalty and Shrinkage Strategies Based on Local Polynomials for Right-Censored Partially Linear Regression
by Syed Ejaz Ahmed, Dursun Aydın and Ersin Yılmaz
Entropy 2022, 24(12), 1833; https://doi.org/10.3390/e24121833 - 15 Dec 2022
Viewed by 1557
Abstract
This study aims to propose modified semiparametric estimators based on six different penalty and shrinkage strategies for the estimation of a right-censored semiparametric regression model. In this context, the methods used to obtain the estimators are ridge, lasso, adaptive lasso, SCAD, MCP, and [...] Read more.
This study aims to propose modified semiparametric estimators based on six different penalty and shrinkage strategies for the estimation of a right-censored semiparametric regression model. In this context, the methods used to obtain the estimators are ridge, lasso, adaptive lasso, SCAD, MCP, and elasticnet penalty functions. The most important contribution that distinguishes this article from its peers is that it uses the local polynomial method as a smoothing method. The theoretical estimation procedures for the obtained estimators are explained. In addition, a simulation study is performed to see the behavior of the estimators and make a detailed comparison, and hepatocellular carcinoma data are estimated as a real data example. As a result of the study, the estimators based on adaptive lasso and SCAD were more resistant to censorship and outperformed the other four estimators. Full article
Show Figures

Figure 1

22 pages, 9306 KiB  
Article
In Vitro Major Arterial Cardiovascular Simulator to Generate Benchmark Data Sets for In Silico Model Validation
by Michelle Wisotzki, Alexander Mair, Paul Schlett, Bernhard Lindner, Max Oberhardt and Stefan Bernhard
Data 2022, 7(11), 145; https://doi.org/10.3390/data7110145 - 27 Oct 2022
Cited by 3 | Viewed by 1658
Abstract
Cardiovascular diseases are commonly caused by atherosclerosis, stenosis and aneurysms. Understanding the influence of these pathological conditions on the circulatory mechanism is required to establish methods for early diagnosis. Different tools have been developed to simulate healthy and pathological conditions of blood flow. [...] Read more.
Cardiovascular diseases are commonly caused by atherosclerosis, stenosis and aneurysms. Understanding the influence of these pathological conditions on the circulatory mechanism is required to establish methods for early diagnosis. Different tools have been developed to simulate healthy and pathological conditions of blood flow. These simulations are often based on computational models that allow the generation of large data sets for further investigation. However, because computational models often lack some aspects of real-world data, hardware simulators are used to close this gap and generate data for model validation. The aim of this study is to develop and validate a hardware simulator to generate benchmark data sets of healthy and pathological conditions. The development process was led by specific design criteria to allow flexible and physiological simulations. The in vitro hardware simulator includes the major 33 arteries and is driven by a ventricular assist device generating a parametrised in-flow condition at the heart node. Physiologic flow conditions, including heart rate, systolic/diastolic pressure, peripheral resistance and compliance, are adjustable across a wide range. The pressure and flow waves at 17 + 1 locations are measured by inverted fluid-resistant pressure transducers and one ultrasound flow transducer, supporting a detailed analysis of the measurement data even for in silico modelling applications. The pressure and flow waves are compared to in vivo measurements and show physiological conditions. The influence of the degree and location of the stenoses on blood pressure and flow was also investigated. The results indicate decreasing translesional pressure and flow with an increasing degree of stenosis, as expected. The benchmark data set is made available to the research community for validating and comparing different types of computational models. It is hoped that the validation and improvement of computational simulation models will provide better clinical predictions. Full article
Show Figures

Figure 1

12 pages, 2426 KiB  
Article
Predictive Modeling for the Diagnosis of Gestational Diabetes Mellitus Using Epidemiological Data in the United Arab Emirates
by Nasloon Ali, Wasif Khan, Amir Ahmad, Mohammad Mehedy Masud, Hiba Adam and Luai A. Ahmed
Information 2022, 13(10), 485; https://doi.org/10.3390/info13100485 - 10 Oct 2022
Cited by 2 | Viewed by 2090
Abstract
Gestational diabetes mellitus (GDM) is a common condition with repercussions for both the mother and her child. Machine learning (ML) modeling techniques were proposed to predict the risk of several medical outcomes. A systematic evaluation of the predictive capacity of maternal factors resulting [...] Read more.
Gestational diabetes mellitus (GDM) is a common condition with repercussions for both the mother and her child. Machine learning (ML) modeling techniques were proposed to predict the risk of several medical outcomes. A systematic evaluation of the predictive capacity of maternal factors resulting in GDM in the UAE is warranted. Data on a total of 3858 women who gave birth and had information on their GDM status in a birth cohort were used to fit the GDM risk prediction model. Information used for the predictive modeling were from self-reported epidemiological data collected at early gestation. Three different ML models, random forest (RF), gradient boosting model (GBM), and extreme gradient boosting (XGBoost), were used to predict GDM. Furthermore, to provide local interpretation of each feature in GDM diagnosis, features were studied using Shapley additive explanations (SHAP). Results obtained using ML models show that XGBoost, which achieved an AUC of 0.77, performed better compared to RF and GBM. Individual feature importance using SHAP value and the XGBoost model show that previous GDM diagnosis, maternal age, body mass index, and gravidity play a vital role in GDM diagnosis. ML models using self-reported epidemiological data are useful and feasible in prediction models for GDM diagnosis amongst pregnant women. Such data should be periodically collected at early pregnancy for health professionals to intervene at earlier stages to prevent adverse outcomes in pregnancy and delivery. The XGBoost algorithm was the optimal model for identifying the features that predict GDM diagnosis. Full article
Show Figures

Figure 1

8 pages, 1351 KiB  
Data Descriptor
Full-Body Mobility Data to Validate Inertial Measurement Unit Algorithms in Healthy and Neurological Cohorts
by Elke Warmerdam, Clint Hansen, Robbin Romijnders, Markus A. Hobert, Julius Welzel and Walter Maetzler
Data 2022, 7(10), 136; https://doi.org/10.3390/data7100136 - 27 Sep 2022
Cited by 6 | Viewed by 2604
Abstract
Gait and balance dysfunctions are common in neurological disorders and have a negative effect on quality of life. Regularly quantifying these mobility limitations can be used to measure disease progression and the effect of treatment. This information can be used to provide a [...] Read more.
Gait and balance dysfunctions are common in neurological disorders and have a negative effect on quality of life. Regularly quantifying these mobility limitations can be used to measure disease progression and the effect of treatment. This information can be used to provide a more individualized treatment. Inertial measurement units (IMUs) can be utilized to quantify mobility in different contexts. However, algorithms are required to extract valuable parameters out of the raw IMU data. These algorithms need to be validated to make sure that they extract the features they should extract. This validation should be performed per disease since different mobility limitations or symptoms can influence the performance of an algorithm in different ways. Therefore, this dataset contains data from both healthy subjects and patients with neurological diseases (Parkinson’s disease, stroke, multiple sclerosis, chronic low back pain). The full bodies of 167 subjects were measured with IMUs and an optical motion capture (reference) system. Subjects performed multiple standardized mobility assessments and non-standardized activities of daily living. The data of 21 healthy subjects are shared online, data of the other subjects and patients can only be obtained after contacting the corresponding author and signing a data sharing agreement. Full article
Show Figures

Figure 1

28 pages, 3820 KiB  
Article
Supervised Learning Models for the Preliminary Detection of COVID-19 in Patients Using Demographic and Epidemiological Parameters
by Aditya Pradhan, Srikanth Prabhu, Krishnaraj Chadaga, Saptarshi Sengupta and Gopal Nath
Information 2022, 13(7), 330; https://doi.org/10.3390/info13070330 - 10 Jul 2022
Cited by 20 | Viewed by 3120
Abstract
The World Health Organization labelled the new COVID-19 breakout a public health crisis of worldwide concern on 30 January 2020, and it was named the new global pandemic in March 2020. It has had catastrophic consequences on the world economy and well-being of [...] Read more.
The World Health Organization labelled the new COVID-19 breakout a public health crisis of worldwide concern on 30 January 2020, and it was named the new global pandemic in March 2020. It has had catastrophic consequences on the world economy and well-being of people and has put a tremendous strain on already-scarce healthcare systems globally, particularly in underdeveloped countries. Over 11 billion vaccine doses have already been administered worldwide, and the benefits of these vaccinations will take some time to appear. Today, the only practical approach to diagnosing COVID-19 is through the RT-PCR and RAT tests, which have sometimes been known to give unreliable results. Timely diagnosis and implementation of precautionary measures will likely improve the survival outcome and decrease the fatality rates. In this study, we propose an innovative way to predict COVID-19 with the help of alternative non-clinical methods such as supervised machine learning models to identify the patients at risk based on their characteristic parameters and underlying comorbidities. Medical records of patients from Mexico admitted between 23 January 2020 and 26 March 2022, were chosen for this purpose. Among several supervised machine learning approaches tested, the XGBoost model achieved the best results with an accuracy of 92%. It is an easy, non-invasive, inexpensive, instant and accurate way of forecasting those at risk of contracting the virus. However, it is pretty early to deduce that this method can be used as an alternative in the clinical diagnosis of coronavirus cases. Full article
Show Figures

Figure 1

Back to TopTop