Bioinformatics and Machine Learning in Disease Research

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Bioinformatics".

Deadline for manuscript submissions: closed (20 December 2022) | Viewed by 19387

Special Issue Editor


E-Mail Website
Guest Editor
1. Transformational Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation (CSIRO), North Ryde, NSW 2113, Australia
2. Applied BioSciences, Faculty of Science and Engineering, Macquarie University, Macquarie Park, NSW 2113, Australia
Interests: bioinformatics; transcriptomics; genomics; clinical genomics

Special Issue Information

Dear Colleagues,

Genomics is forecast to be the largest source of Big Data by 2025, totaling up to 40 exabytes per year and exceeding all current sources of Big Data combined (astronomy, YouTube, and Twitter). When other bioinformatics data are included in this, the vast volumes we have at our disposal become clear. ‘Data is the new gold’ is an often-used phrase that speaks to the value we can gain from such rich datasets, but to create this value, we need to effectively mine the data using innovative machine learning approaches.

Many prevalent human diseases are driven by multiple and complex factors, including diabetes, cardiovascular disease, and cancer. To understand the complex interplay of genetic and environmental factors driving these diseases, we need data-driven approaches which cater to large datasets, and machine learning offers us exactly this. Given the huge potential of machine learning and bioinformatics to understand human disease, a Special Issue of the journal Genes is being launched to explore the methods and applications of machine learning and bioinformatics in human disease. Authors are encouraged to submit original manuscripts describing the utilization of machine learning and bioinformatics to answer scientific questions relating to human disease. Also encouraged are papers describing new methods and reviews or comparisons of machine learning approaches in the context of human disease.

Dr. Natalie A. Twine
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • bioinformatics
  • human disease
  • genomics
  • proteomics
  • transcriptomics
  • big data
  • complex genetic disease

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 3245 KiB  
Article
Research Trends of Vitamin D Metabolism Gene Polymorphisms Based on a Bibliometric Investigation
by Mohamed Abouzid, Marta Karaźniewicz-Łada, Basel Abdelazeem and James Robert Brašić
Genes 2023, 14(1), 215; https://doi.org/10.3390/genes14010215 - 14 Jan 2023
Cited by 2 | Viewed by 2847
Abstract
Vitamin D requires activation to show its pharmacological effect. While most studies investigate the association between vitamin D and disease, only a few focus on the impact of vitamin D metabolism gene polymorphisms (vitDMGPs). This bibliometric study aims to provide an overview of [...] Read more.
Vitamin D requires activation to show its pharmacological effect. While most studies investigate the association between vitamin D and disease, only a few focus on the impact of vitamin D metabolism gene polymorphisms (vitDMGPs). This bibliometric study aims to provide an overview of current publications on vitDMGPs (CYP27B1, CYP24A1, CYP2R1, CYP27A1, CYP2R1, DHCR7/NADSYN1), compare them across countries, affiliations, and journals, and inspect keywords, co-citations, and citation bursts to identify trends in this research field. CiteSpace© (version 6.1.R3, Chaomei Chen), Bibliometrix© (R version 4.1.3 library, K-Synth Srl, University of Naples Federico II, Naples, Italy), VOSviewer© (version 1.6.1, Nees Jan van Eck and Ludo Waltman, Leiden University, Leiden, Netherlands) and Microsoft® Excel 365 (Microsoft, Redmond, Washington, USA) classified and summarized Web of Science articles from 1998 to November 2022. We analyzed 2496 articles and built a timeline of co-citations and a bibliometric keywords co-occurrence map. The annual growth rate of vitDMGPs publications was 18.68%, and their relative research interest and published papers were increasing. The United States of America leads vitDMGPs research. The University of California System attained the highest quality of vitDMGPs research, followed by the American National Institutes of Health and Harvard University. The three productive journals on vitDMGPs papers are J. Steroid. Biochem. Mol. Biol., PLOS ONE, and J. Clin. Endocrinol. Metab. We highlighted that the vitDMGPs domain is relatively new, and many novel research opportunities are available, especially those related to studying single nucleotide polymorphisms or markers in a specific gene in the vitamin D metabolism cycle and their association with disease. Genome-wide association studies, genetic variants of vitDMGPs, and vitamin D and its role in cancer risk were the most popular studies. CYP24A1 and CYB27A1 were the most-studied genes in vitDMGPs. Insulin was the longest-trending studied hormone associated with vitDMGPs. Trending topics in this field relate to bile acid metabolism, transcriptome and gene expression, biomarkers, single nucleotide polymorphism, and fibroblast growth factor 23. We also expect an increase in original research papers investigating the association between vitDMGPs and coronavirus disease 2019, hypercalcemia, Smith–Lemli–Opitz syndrome, 27-hydroxycholesterol, and mendelian randomization. These findings will provide the foundations for innovations in the diagnosis and treatment of a vast spectrum of conditions. Full article
(This article belongs to the Special Issue Bioinformatics and Machine Learning in Disease Research)
Show Figures

Graphical abstract

15 pages, 4519 KiB  
Article
Identification and Validation of Three m6A Regulators: FTO, HNRNPC, and HNRNPA2B1 as Potential Biomarkers for Endometriosis
by Jiani Sun, Lei Gan and Jing Sun
Genes 2023, 14(1), 86; https://doi.org/10.3390/genes14010086 - 28 Dec 2022
Cited by 4 | Viewed by 1844
Abstract
Background: N6-methyladenosine is involved in numerous biological processes. However, the significance of m6A regulators in endometriosis is still unclear. Methods: We extracted three significant m6A regulators between non-endometriosis and endometriosis patients from GSE6364 and then we used the random forest model to obtain [...] Read more.
Background: N6-methyladenosine is involved in numerous biological processes. However, the significance of m6A regulators in endometriosis is still unclear. Methods: We extracted three significant m6A regulators between non-endometriosis and endometriosis patients from GSE6364 and then we used the random forest model to obtain significant m6A regulators. In addition, we used the nomogram model to evaluate the prevalence of endometriosis. The predictive ability of the candidate genes was evaluated through the receiver operating characteristic curves, while the expression of candidate biomarkers was validated via Western blotting. Additionally, according to candidate genes, we identified m6A subtypes based on which functional enrichment analysis and immune infiltration were performed. Results: Three significant m6A regulators (fat mass and obesity-associated protein, heterogeneous nuclear ribonucleoprotein A2/B1, and heterogeneous nuclear ribonucleoprotein C) were discovered. We identified three m6A subtypes, including clusterA, clusterB, and clusterC. ClusterB was demonstrated to be correlated with significantly overexpressed VEGF and notably downregulated ESR1 and PGR, which are convincing biomarkers of endometriosis. Furthermore, we discovered that patients in clusterB were associated with high levels of neutrophil infiltration, a reduced Treg/Th17 ratio, and overexpressed pyroptosis-related genes, which also indicated that clusterB was highly linked to endometriosis. Conclusion: In conclusion, m6A regulators are of great significance for the occurrence and process of endometriosis. The findings of our study provide novel insights into the underlying molecular mechanism of endometriosis. The novel investigation of m6A patterns and their correlation with immunity may also help to guide the clinical diagnosis, provide prognostic significance, and develop immunotherapy strategies for endometriosis patients. Full article
(This article belongs to the Special Issue Bioinformatics and Machine Learning in Disease Research)
Show Figures

Figure 1

8 pages, 974 KiB  
Article
Machine Learning Heuristics on Gingivobuccal Cancer Gene Datasets Reveals Key Candidate Attributes for Prognosis
by Tanvi Singh, Girik Malik, Saloni Someshwar, Hien Thi Thu Le, Rathnagiri Polavarapu, Laxmi N. Chavali, Nidheesh Melethadathil, Vijayaraghava Seshadri Sundararajan, Jayaraman Valadi, P. B. Kavi Kishor and Prashanth Suravajhala
Genes 2022, 13(12), 2379; https://doi.org/10.3390/genes13122379 - 16 Dec 2022
Cited by 2 | Viewed by 1808
Abstract
Delayed cancer detection is one of the common causes of poor prognosis in the case of many cancers, including cancers of the oral cavity. Despite the improvement and development of new and efficient gene therapy treatments, very little has been carried out to [...] Read more.
Delayed cancer detection is one of the common causes of poor prognosis in the case of many cancers, including cancers of the oral cavity. Despite the improvement and development of new and efficient gene therapy treatments, very little has been carried out to algorithmically assess the impedance of these carcinomas. In this work, from attributes or NCBI’s oral cancer datasets, viz. (i) name, (ii) gene(s), (iii) protein change, (iv) condition(s), clinical significance (last reviewed). We sought to train the number of instances emerging from them. Further, we attempt to annotate viable attributes in oral cancer gene datasets for the identification of gingivobuccal cancer (GBC). We further apply supervised and unsupervised machine learning methods to the gene datasets, revealing key candidate attributes for GBC prognosis. Our work highlights the importance of automated identification of key genes responsible for GBC that could perhaps be easily replicated in other forms of oral cancer detection. Full article
(This article belongs to the Special Issue Bioinformatics and Machine Learning in Disease Research)
Show Figures

Figure 1

17 pages, 1393 KiB  
Article
Biomarker Discovery for Meta-Classification of Melanoma Metastatic Progression Using Transfer Learning
by Jose Marie Antonio Miñoza, Jonathan Adam Rico, Pia Regina Fatima Zamora, Manny Bacolod, Reinhard Laubenbacher, Gerard G. Dumancas and Romulo de Castro
Genes 2022, 13(12), 2303; https://doi.org/10.3390/genes13122303 - 07 Dec 2022
Cited by 4 | Viewed by 2076
Abstract
Melanoma is considered to be the most serious and aggressive type of skin cancer, and metastasis appears to be the most important factor in its prognosis. Herein, we developed a transfer learning-based biomarker discovery model that could aid in the diagnosis and prognosis [...] Read more.
Melanoma is considered to be the most serious and aggressive type of skin cancer, and metastasis appears to be the most important factor in its prognosis. Herein, we developed a transfer learning-based biomarker discovery model that could aid in the diagnosis and prognosis of this disease. After applying it to the ensemble machine learning model, results revealed that the genes found were consistent with those found using other methodologies previously applied to the same TCGA (The Cancer Genome Atlas) data set. Further novel biomarkers were also found. Our ensemble model achieved an AUC of 0.9861, an accuracy of 91.05, and an F1 score of 90.60 using an independent validation data set. This study was able to identify potential genes for diagnostic classification (C7 and GRIK5) and diagnostic and prognostic biomarkers (S100A7, S100A7, KRT14, KRT17, KRT6B, KRTDAP, SERPINB4, TSHR, PVRL4, WFDC5, IL20RB) in melanoma. The results show the utility of a transfer learning approach for biomarker discovery in melanoma. Full article
(This article belongs to the Special Issue Bioinformatics and Machine Learning in Disease Research)
Show Figures

Figure 1

20 pages, 2656 KiB  
Article
MSF-UBRW: An Improved Unbalanced Bi-Random Walk Method to Infer Human lncRNA-Disease Associations
by Lingyun Dai, Rong Zhu, Jinxing Liu, Feng Li, Juan Wang and Junliang Shang
Genes 2022, 13(11), 2032; https://doi.org/10.3390/genes13112032 - 04 Nov 2022
Cited by 2 | Viewed by 1411
Abstract
Long-non-coding RNA (lncRNA) is a transcription product that exerts its biological functions through a variety of mechanisms. The occurrence and development of a series of human diseases are closely related to abnormal expression levels of lncRNAs. Scientists have developed many computational models to [...] Read more.
Long-non-coding RNA (lncRNA) is a transcription product that exerts its biological functions through a variety of mechanisms. The occurrence and development of a series of human diseases are closely related to abnormal expression levels of lncRNAs. Scientists have developed many computational models to identify the lncRNA-disease associations (LDAs). However, many potential LDAs are still unknown. In this paper, a novel method, namely MSF-UBRW (multiple similarities fusion based on unbalanced bi-random walk), is designed to explore new LDAs. First, two similarities (functional similarity and Gaussian Interaction Profile kernel similarity) of lncRNAs are calculated and fused linearly, also for disease data. Then, the known association matrix is preprocessed. Next, the linear neighbor similarities of lncRNAs and diseases are calculated, respectively. After that, the potential associations are predicted based on unbalanced bi-random walk. The fusion of multiple similarities improves the prediction performance of MSF-UBRW to a large extent. Finally, the prediction ability of the MSF-UBRW algorithm is measured by two statistical methods, leave-one-out cross-validation (LOOCV) and 5-fold cross-validation (5-fold CV). The AUCs of 0.9391 in LOOCV and 0.9183 (±0.0054) in 5-fold CV confirmed the reliable prediction ability of the MSF-UBRW method. Case studies of three common diseases also show that the MSF-UBRW method can infer new LDAs effectively. Full article
(This article belongs to the Special Issue Bioinformatics and Machine Learning in Disease Research)
Show Figures

Figure 1

15 pages, 23054 KiB  
Article
A Comprehensive Analysis of KRT19 Combined with Immune Infiltration to Predict Breast Cancer Prognosis
by Lusi Mi, Nan Liang and Hui Sun
Genes 2022, 13(10), 1838; https://doi.org/10.3390/genes13101838 - 12 Oct 2022
Cited by 1 | Viewed by 1909
Abstract
To date, no study has been conducted to explore the mechanism of KRT19 and the correlation between the expression of KRT19 and immune infiltration in breast cancer (BRCA). TCGA, TIMER2.0, UALCAN, and other databases were used to analyze the expression, prognostic roles, epigenetic [...] Read more.
To date, no study has been conducted to explore the mechanism of KRT19 and the correlation between the expression of KRT19 and immune infiltration in breast cancer (BRCA). TCGA, TIMER2.0, UALCAN, and other databases were used to analyze the expression, prognostic roles, epigenetic variants, and possible oncogenic mechanisms of KRT19 in BRCA. As a result, KRT19 showed higher expression compared with the normal tissues in BRCA. In addition, the epigenetic variation in KRT19, including gene alteration, mutation type and sites, DNA methylation, RNA modification, and phosphorylation, showed diversity in BRCA. Further mechanistic exploration suggested that the IL-17 signaling pathway and estrogen response might play essential roles in the regulation of KRT19. Moreover, KRT19 has different regulatory biological functions in BRCA. More importantly, the expression of KRT19 was closely related to immune infiltration and combining the two could effectively predict overall survival. Finally, a nomogram based on genes associated with cancer-immunity cycle signatures, which could predict progress free interval, was constructed and evaluated successfully. In conclusion, KRT19 may play a role in the occurrence and development of BRCA through the IL-17 signaling pathway. Meanwhile, KRT19 combined with immune infiltration can evaluate the prognosis of BRCA patients. Full article
(This article belongs to the Special Issue Bioinformatics and Machine Learning in Disease Research)
Show Figures

Figure 1

12 pages, 370 KiB  
Article
Estimation of Metabolic Effects upon Cadmium Exposure during Pregnancy Using Tensor Decomposition
by Yuki Amakura and Y-h. Taguchi
Genes 2022, 13(10), 1698; https://doi.org/10.3390/genes13101698 - 22 Sep 2022
Viewed by 1579
Abstract
A simple tensor decomposition model was applied to the liver transcriptome analysis data to elucidate the cause of cadmium-induced gene overexpression. In addition, we estimated the mechanism by which prenatal Cd exposure disrupts insulin metabolism in offspring. Numerous studies have reported on the [...] Read more.
A simple tensor decomposition model was applied to the liver transcriptome analysis data to elucidate the cause of cadmium-induced gene overexpression. In addition, we estimated the mechanism by which prenatal Cd exposure disrupts insulin metabolism in offspring. Numerous studies have reported on the toxicity of Cd. A liver transcriptome analysis revealed that Cd toxicity induces intracellular oxidative stress and mitochondrial dysfunction via changes in gene expression, which in turn induces endoplasmic reticulum-associated degradation via abnormal protein folding. However, the specific mechanisms underlying these effects remain unknown. In this study, we found that Cd-induced endoplasmic reticulum stress may promote increased expression of tumor necrosis factor-α (TNF-α). Based on the high expression of genes involved in the production of sphingolipids, it was also found that the accumulation of ceramide may induce intracellular oxidative stress through the overproduction of reactive oxygen species. In addition, the high expression of a set of genes involved in the electron transfer system may contribute to oxidative stress. These findings allowed us to identify the mechanisms by which intracellular oxidative stress leads to the phosphorylation of insulin receptor substrate 1, which plays a significant role in the insulin signaling pathway. Full article
(This article belongs to the Special Issue Bioinformatics and Machine Learning in Disease Research)
Show Figures

Figure 1

15 pages, 5176 KiB  
Article
A Novel 3-Gene Signature for Identifying COVID-19 Patients Based on Bioinformatics and Machine Learning
by Guichuan Lai, Hui Liu, Jielian Deng, Kangjie Li and Biao Xie
Genes 2022, 13(9), 1602; https://doi.org/10.3390/genes13091602 - 08 Sep 2022
Cited by 11 | Viewed by 2223
Abstract
Although many biomarkers associated with coronavirus disease 2019 (COVID-19) were found, a novel signature relevant to immune cells has not been developed. In this work, the “CIBERSORT” algorithm was used to assess the fraction of immune infiltrating cells in GSE152641 and GSE171110. Key [...] Read more.
Although many biomarkers associated with coronavirus disease 2019 (COVID-19) were found, a novel signature relevant to immune cells has not been developed. In this work, the “CIBERSORT” algorithm was used to assess the fraction of immune infiltrating cells in GSE152641 and GSE171110. Key modules associated with important immune cells were selected by the “WGCNA” package. The “GO” enrichment analysis was used to reveal the biological function associated with COVID-19. The “Boruta” algorithm was used to screen candidate genes, and the “LASSO” algorithm was used for collinearity reduction. A novel gene signature was developed based on multivariate logistic regression analysis. Subsequently, M0 macrophages (PRAUC = 0.948 in GSE152641 and PRAUC = 0.981 in GSE171110) and neutrophils (PRAUC = 0.892 in GSE152641 and PRAUC = 0.960 in GSE171110) were considered as important immune cells. Forty-three intersected genes from two modules were selected, which mainly participated in some immune-related activities. Finally, a three-gene signature comprising CLEC4D, DUSP13, and UNC5A that can accurately distinguish COVID-19 patients and healthy controls in three datasets was constructed. The ROCAUC was 0.974 in the training set, 0.946 in the internal test set, and 0.709 in the external test set. In conclusion, we constructed a three-gene signature to identify COVID-19, and CLEC4D, DUSP13, and UNC5A may be potential biomarkers for COVID-19 patients. Full article
(This article belongs to the Special Issue Bioinformatics and Machine Learning in Disease Research)
Show Figures

Figure 1

15 pages, 3881 KiB  
Article
ASPN Is a Potential Biomarker and Associated with Immune Infiltration in Endometriosis
by Li Wang and Jing Sun
Genes 2022, 13(8), 1352; https://doi.org/10.3390/genes13081352 - 28 Jul 2022
Cited by 3 | Viewed by 2759
Abstract
Objective: Endometriosis is a benign gynecological disease characterized by distant metastasis. Previous studies have discovered abnormal numbers and function of immune cells in endometriotic lesions. We aimed to find potential biomarkers of endometriosis and to explore the relationship between ASPN and the immune [...] Read more.
Objective: Endometriosis is a benign gynecological disease characterized by distant metastasis. Previous studies have discovered abnormal numbers and function of immune cells in endometriotic lesions. We aimed to find potential biomarkers of endometriosis and to explore the relationship between ASPN and the immune microenvironment of endometriosis. Methods: We obtained the GSE141549 and GSE7305 datasets containing endometriosis and normal endometrial samples from the Gene Expression Omnibus database (GEO). In the GSE141549 dataset, differentially expressed genes (DEGs) were found. The Least Absolute Shrinkage and Selection Operator (Lasso) regression and generalized linear models (GLMs) were used to screen new biomarkers. The expression levels and diagnostic utility of biomarkers were assessed in GSE7305, and biomarker expression levels were further validated using qRT-PCR and western blot. We identified DEGs between high and low expression groups of key biomarkers. Enrichment analysis was carried out to discover the target gene’s biological function. We analyzed the relationship between key biomarker expression and patient clinical features. Finally, the immune cells that infiltrate endometriosis were assessed using the Microenvironment Cell Population-Counter (MCP-counter), and the correlation of biomarker expression with immune cell infiltration and immune checkpoints genes was studied. Results: There were a total of 38 DEGs discovered. Two machine learning techniques were used to identify 10 genes. Six biomarkers (SCG2, ASPN, SLIT2, GEM, EGR1, and FOS) had good diagnostic efficiency (AUC > 0.7) by internal and external validation. We excluded previously reported related genes (SLIT2, EGR1, and FOS). ASPN was the most significantly differentially expressed biomarker between normal and ectopic endometrial tissues, as verified by qPCR. The western blot assay revealed a significant upregulation of ASPN expression in endometriotic tissues. The investigation for DEGs in the ASPN high- and low-expression groups revealed that the DEGs were particularly enriched in extracellular matrix tissue, vascular smooth muscle contraction, cytokine interactions, the calcium signaling pathway, and the chemokine signaling pathway. High ASPN expression was related to r-AFS stage (p = 0.006), age (p = 0.03), and lesion location (p < 0.001). Univariate and multivariate logistic regression analysis showed that ASPN expression was an independent influencing factor in patients with endometriosis. Immune cell infiltration analysis revealed a significant increase in T-cell, B-cell, and fibroblast infiltration in endometriosis lesions; cytotoxic lymphocyte, NK-cell, and endothelial cell infiltration were reduced. Additionally, the percentage of T cells, B cells, fibroblasts, and endothelial cells was favorably connected with ASPN expression, while the percentage of cytotoxic lymphocytes and NK cells was negatively correlated. Immune checkpoint gene (CTLA4, LAG3, CD27, CD40, and ICOS) expression and ASPN expression were positively associated. Conclusions: Increased expression of ASPN is associated with immune infiltration in endometriosis, and ASPN can be used as a diagnostic biomarker as well as a potential immunotherapeutic target in endometriosis. Full article
(This article belongs to the Special Issue Bioinformatics and Machine Learning in Disease Research)
Show Figures

Figure 1

Back to TopTop