ijms-logo

Journal Browser

Journal Browser

Special Issue "Deep Learning and Machine Learning in Bioinformatics"

A special issue of International Journal of Molecular Sciences (ISSN 1422-0067). This special issue belongs to the section "Molecular Informatics".

Deadline for manuscript submissions: closed (31 October 2021).

Special Issue Editors

Dr. Jung Hun Oh
E-Mail Website
Guest Editor
Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
Interests: radiogenomics; bioinformatics; genome-wide association study; tumor mutational burden; cancer biology
Dr. Mingon Kang
E-Mail Website
Guest Editor
Department of Computer Science; University of Nevada, Las Vegas, NV 89154, USA
Interests: bioinformatics; machine learning; data mining; computer vision; and big data analytics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

A Special Issue on the hot topic "Deep Learning and Machine Learning in Bioinformatics" is being prepared for the journal IJMS. In recent years, deep learning has been spotlighted as a highly active research field with great success in various machine learning communities, such as image analysis, speech recognition, and natural language processing; now, its promising potential is being actively discussed in the field of biomedicine. In particular, a dramatically increasing number of deep-learning-based approaches have been proposed in biomedical image and signal processing. However, the application of deep learning in other biomedical areas, such as genomics and computational biology, has been rather limited due to the difficulty of definition and interpretation of deep learning architecture. Moreover, there are still many challenging open problems in deep learning that need to be solved to allow its active use in the field of bioinformatics.

In this Special Issue, we envision that the application of deep learning and novel machine learning methods to biological problems will provide practical guides and useful solutions to improve predictive performance and enhance our understanding of biological mechanisms of target diseases. We invite your contributions in the form of original research articles with sound and innovative methodology.

This Special Issue welcomes topics including (but not limited to) the following:

  • Protein structures;
  • Single-cell clustering;
  • Next-generation sequencing;
  • Gene expression regulations;
  • Genome-wide association studies;
  • Clustering cancer subtypes in multi-omics data;
  • Prediction of clinical outcomes using genomic data;
  • Prediction of drug response;
  • Integration of omics data;
  • Mathematical modeling in cancer;
  • Protein function prediction.

Dr. Jung Hun Oh
Dr. Mingon Kang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. International Journal of Molecular Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. There is an Article Processing Charge (APC) for publication in this open access journal. For details about the APC please see here. Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (16 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

Article
Sparsely Connected Autoencoders: A Multi-Purpose Tool for Single Cell omics Analysis
Int. J. Mol. Sci. 2021, 22(23), 12755; https://doi.org/10.3390/ijms222312755 - 25 Nov 2021
Viewed by 229
Abstract
Background: Biological processes are based on complex networks of cells and molecules. Single cell multi-omics is a new tool aiming to provide new incites in the complex network of events controlling the functionality of the cell. Methods: Since single cell technologies provide many [...] Read more.
Background: Biological processes are based on complex networks of cells and molecules. Single cell multi-omics is a new tool aiming to provide new incites in the complex network of events controlling the functionality of the cell. Methods: Since single cell technologies provide many sample measurements, they are the ideal environment for the application of Deep Learning and Machine Learning approaches. An autoencoder is composed of an encoder and a decoder sub-model. An autoencoder is a very powerful tool in data compression and noise removal. However, the decoder model remains a black box from which is impossible to depict the contribution of the single input elements. We have recently developed a new class of autoencoders, called Sparsely Connected Autoencoders (SCA), which have the advantage of providing a controlled association among the input layer and the decoder module. This new architecture has the benefit that the decoder model is not a black box anymore and can be used to depict new biologically interesting features from single cell data. Results: Here, we show that SCA hidden layer can grab new information usually hidden in single cell data, like providing clustering on meta-features difficult, i.e. transcription factors expression, or not technically not possible, i.e. miRNA expression, to depict in single cell RNAseq data. Furthermore, SCA representation of cell clusters has the advantage of simulating a conventional bulk RNAseq, which is a data transformation allowing the identification of similarity among independent experiments. Conclusions: In our opinion, SCA represents the bioinformatics version of a universal “Swiss-knife” for the extraction of hidden knowledgeable features from single cell omics data. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Article
A Deep Learning Approach with Data Augmentation to Predict Novel Spider Neurotoxic Peptides
Int. J. Mol. Sci. 2021, 22(22), 12291; https://doi.org/10.3390/ijms222212291 - 13 Nov 2021
Viewed by 454
Abstract
As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing a deep learning [...] Read more.
As major components of spider venoms, neurotoxic peptides exhibit structural diversity, target specificity, and have great pharmaceutical potential. Deep learning may be an alternative to the laborious and time-consuming methods for identifying these peptides. However, the major hurdle in developing a deep learning model is the limited data on neurotoxic peptides. Here, we present a peptide data augmentation method that improves the recognition of neurotoxic peptides via a convolutional neural network model. The neurotoxic peptides were augmented with the known neurotoxic peptides from UniProt database, and the models were trained using a training set with or without the generated sequences to verify the augmented data. The model trained with the augmented dataset outperformed the one with the unaugmented dataset, achieving accuracy of 0.9953, precision of 0.9922, recall of 0.9984, and F1 score of 0.9953 in simulation dataset. From the set of all RNA transcripts of Callobius koreanus spider, we discovered neurotoxic peptides via the model, resulting in 275 putative peptides of which 252 novel sequences and only 23 sequences showing homology with the known peptides by Basic Local Alignment Search Tool. Among these 275 peptides, four were selected and shown to have neuromodulatory effects on the human neuroblastoma cell line SH-SY5Y. The augmentation method presented here may be applied to the identification of other functional peptides from biological resources with insufficient data. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Article
Multi-Run Concrete Autoencoder to Identify Prognostic lncRNAs for 12 Cancers
Int. J. Mol. Sci. 2021, 22(21), 11919; https://doi.org/10.3390/ijms222111919 - 03 Nov 2021
Viewed by 358
Abstract
Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that [...] Read more.
Background: Long non-coding RNA plays a vital role in changing the expression profiles of various target genes that lead to cancer development. Thus, identifying prognostic lncRNAs related to different cancers might help in developing cancer therapy. Method: To discover the critical lncRNAs that can identify the origin of different cancers, we propose the use of the state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. We thus propose a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. The assumption is that a feature appearing in multiple runs carries more meaningful information about the data under consideration. The genome-wide lncRNA expression profiles of 12 different types of cancers, with a total of 4768 samples available in The Cancer Genome Atlas (TCGA), were analyzed to discover the key lncRNAs. The lncRNAs identified by multiple runs of CAE were added to a final list of key lncRNAs that are capable of identifying 12 different cancers. Results: Our results showed that mrCAE performs better in feature selection than single-run CAE, standard autoencoder (AE), and other state-of-the-art feature selection techniques. This study revealed a set of top-ranking 128 lncRNAs that could identify the origin of 12 different cancers with an accuracy of 95%. Survival analysis showed that 76 of 128 lncRNAs have the prognostic capability to differentiate high- and low-risk groups of patients with different cancers. Conclusion: The proposed mrCAE, which selects actual features, outperformed the AE even though it selects the latent or pseudo-features. By selecting actual features instead of pseudo-features, mrCAE can be valuable for precision medicine. The identified prognostic lncRNAs can be further studied to develop therapies for different cancers. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Article
Machine Learning Assisted Approach for Finding Novel High Activity Agonists of Human Ectopic Olfactory Receptors
Int. J. Mol. Sci. 2021, 22(21), 11546; https://doi.org/10.3390/ijms222111546 - 26 Oct 2021
Viewed by 442
Abstract
Olfactory receptors (ORs) constitute the largest superfamily of G protein-coupled receptors (GPCRs). ORs are involved in sensing odorants as well as in other ectopic roles in non-nasal tissues. Matching of an enormous number of the olfactory stimulation repertoire to its counterpart OR through [...] Read more.
Olfactory receptors (ORs) constitute the largest superfamily of G protein-coupled receptors (GPCRs). ORs are involved in sensing odorants as well as in other ectopic roles in non-nasal tissues. Matching of an enormous number of the olfactory stimulation repertoire to its counterpart OR through machine learning (ML) will enable understanding of olfactory system, receptor characterization, and exploitation of their therapeutic potential. In the current study, we have selected two broadly tuned ectopic human OR proteins, OR1A1 and OR2W1, for expanding their known chemical space by using molecular descriptors. We present a scheme for selecting the optimal features required to train an ML-based model, based on which we selected the random forest (RF) as the best performer. High activity agonist prediction involved screening five databases comprising ~23 M compounds, using the trained RF classifier. To evaluate the effectiveness of the machine learning based virtual screening and check receptor binding site compatibility, we used docking of the top target ligands to carefully develop receptor model structures. Finally, experimental validation of selected compounds with significant docking scores through in vitro assays revealed two high activity novel agonists for OR1A1 and one for OR2W1. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Graphical abstract

Article
iBitter-Fuse: A Novel Sequence-Based Bitter Peptide Predictor by Fusing Multi-View Features
Int. J. Mol. Sci. 2021, 22(16), 8958; https://doi.org/10.3390/ijms22168958 - 19 Aug 2021
Viewed by 898
Abstract
Accurate identification of bitter peptides is of great importance for better understanding their biochemical and biophysical properties. To date, machine learning-based methods have become effective approaches for providing a good avenue for identifying potential bitter peptides from large-scale protein datasets. Although few machine [...] Read more.
Accurate identification of bitter peptides is of great importance for better understanding their biochemical and biophysical properties. To date, machine learning-based methods have become effective approaches for providing a good avenue for identifying potential bitter peptides from large-scale protein datasets. Although few machine learning-based predictors have been developed for identifying the bitterness of peptides, their prediction performances could be improved. In this study, we developed a new predictor (named iBitter-Fuse) for achieving more accurate identification of bitter peptides. In the proposed iBitter-Fuse, we have integrated a variety of feature encoding schemes for providing sufficient information from different aspects, namely consisting of compositional information and physicochemical properties. To enhance the predictive performance, the customized genetic algorithm utilizing self-assessment-report (GA-SAR) was employed for identifying informative features followed by inputting optimal ones into a support vector machine (SVM)-based classifier for developing the final model (iBitter-Fuse). Benchmarking experiments based on both 10-fold cross-validation and independent tests indicated that the iBitter-Fuse was able to achieve more accurate performance as compared to state-of-the-art methods. To facilitate the high-throughput identification of bitter peptides, the iBitter-Fuse web server was established and made freely available online. It is anticipated that the iBitter-Fuse will be a useful tool for aiding the discovery and de novo design of bitter peptides. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Article
Cross-Predicting Essential Genes between Two Model Eukaryotic Species Using Machine Learning
Int. J. Mol. Sci. 2021, 22(10), 5056; https://doi.org/10.3390/ijms22105056 - 11 May 2021
Cited by 1 | Viewed by 919
Abstract
Experimental studies of Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular and cellular processes in metazoans at large. Since the publication of their genomes, functional genomic investigations have identified genes that are essential or non-essential for survival in [...] Read more.
Experimental studies of Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular and cellular processes in metazoans at large. Since the publication of their genomes, functional genomic investigations have identified genes that are essential or non-essential for survival in each species. Recently, a range of features linked to gene essentiality have been inferred using a machine learning (ML)-based approach, allowing essentiality predictions within a species. Nevertheless, predictions between species are still elusive. Here, we undertake a comprehensive study using ML to discover and validate features of essential genes common to both C. elegans and D. melanogaster. We demonstrate that the cross-species prediction of gene essentiality is possible using a subset of features linked to nucleotide/protein sequences, protein orthology and subcellular localisation, single-cell RNA-seq, and histone methylation markers. Complementary analyses showed that essential genes are enriched for transcription and translation functions and are preferentially located away from heterochromatin regions of C. elegans and D. melanogaster chromosomes. The present work should enable the cross-prediction of essential genes between model and non-model metazoans. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Article
smartPARE: An R Package for Efficient Identification of True mRNA Cleavage Sites
Int. J. Mol. Sci. 2021, 22(8), 4267; https://doi.org/10.3390/ijms22084267 - 20 Apr 2021
Cited by 1 | Viewed by 671
Abstract
Degradome sequencing is commonly used to generate high-throughput information on mRNA cleavage sites mediated by small RNAs (sRNA). In our datasets of potato (Solanum tuberosum, St) and Phytophthora infestans (Pi), initial predictions generated high numbers of cleavage site predictions, which highlighted [...] Read more.
Degradome sequencing is commonly used to generate high-throughput information on mRNA cleavage sites mediated by small RNAs (sRNA). In our datasets of potato (Solanum tuberosum, St) and Phytophthora infestans (Pi), initial predictions generated high numbers of cleavage site predictions, which highlighted the need of improved analytic tools. Here, we present an R package based on a deep learning convolutional neural network (CNN) in a machine learning environment to optimize discrimination of false from true cleavage sites. When applying smartPARE to our datasets on potato during the infection process by the late blight pathogen, 7.3% of all cleavage windows represented true cleavages distributed on 214 sites in P. infestans and 444 sites in potato. The sRNA landscape of the two organisms is complex with uneven sRNA production and cleavage regions widespread in the two genomes. Multiple targets and several cases of complex regulatory cascades, particularly in potato, was revealed. We conclude that our new analytic approach is useful for anyone working on complex biological systems and with the interest of identifying cleavage sites particularly inferred by sRNA classes beyond miRNAs. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Article
MET Exon 14 Skipping: A Case Study for the Detection of Genetic Variants in Cancer Driver Genes by Deep Learning
Int. J. Mol. Sci. 2021, 22(8), 4217; https://doi.org/10.3390/ijms22084217 - 19 Apr 2021
Viewed by 804
Abstract
Background: Disruption of alternative splicing (AS) is frequently observed in cancer and might represent an important signature for tumor progression and therapy. Exon skipping (ES) represents one of the most frequent AS events, and in non-small cell lung cancer (NSCLC) MET exon 14 [...] Read more.
Background: Disruption of alternative splicing (AS) is frequently observed in cancer and might represent an important signature for tumor progression and therapy. Exon skipping (ES) represents one of the most frequent AS events, and in non-small cell lung cancer (NSCLC) MET exon 14 skipping was shown to be targetable. Methods: We constructed neural networks (NN/CNN) specifically designed to detect MET exon 14 skipping events using RNAseq data. Furthermore, for discovery purposes we also developed a sparsely connected autoencoder to identify uncharacterized MET isoforms. Results: The neural networks had a Met exon 14 skipping detection rate greater than 94% when tested on a manually curated set of 690 TCGA bronchus and lung samples. When globally applied to 2605 TCGA samples, we observed that the majority of false positives was characterized by a blurry coverage of exon 14, but interestingly they share a common coverage peak in the second intron and we speculate that this event could be the transcription signature of a LINE1 (Long Interspersed Nuclear Element 1)-MET (Mesenchymal Epithelial Transition receptor tyrosine kinase) fusion. Conclusions: Taken together, our results indicate that neural networks can be an effective tool to provide a quick classification of pathological transcription events, and sparsely connected autoencoders could represent the basis for the development of an effective discovery tool. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Article
PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features
Int. J. Mol. Sci. 2021, 22(5), 2704; https://doi.org/10.3390/ijms22052704 - 08 Mar 2021
Cited by 3 | Viewed by 951
Abstract
Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction [...] Read more.
Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Article
Dissecting Response to Cancer Immunotherapy by Applying Bayesian Network Analysis to Flow Cytometry Data
Int. J. Mol. Sci. 2021, 22(5), 2316; https://doi.org/10.3390/ijms22052316 - 26 Feb 2021
Cited by 1 | Viewed by 881
Abstract
Cancer immunotherapy, specifically immune checkpoint blockade, has been found to be effective in the treatment of metastatic cancers. However, only a subset of patients achieve clinical responses. Elucidating pretreatment biomarkers predictive of sustained clinical response is a major research priority. Another research priority [...] Read more.
Cancer immunotherapy, specifically immune checkpoint blockade, has been found to be effective in the treatment of metastatic cancers. However, only a subset of patients achieve clinical responses. Elucidating pretreatment biomarkers predictive of sustained clinical response is a major research priority. Another research priority is evaluating changes in the immune system before and after treatment in responders vs. nonresponders. Our group has been studying immune networks as an accurate reflection of the global immune state. Flow cytometry (FACS, fluorescence-activated cell sorting) data characterizing immune cell panels in peripheral blood mononuclear cells (PBMC) from gastroesophageal adenocarcinoma (GEA) patients were used to analyze changes in immune networks in this setting. Here, we describe a novel computational pipeline to perform secondary analyses of FACS data using systems biology/machine learning techniques and concepts. The pipeline is centered around comparative Bayesian network analyses of immune networks and is capable of detecting strong signals that conventional methods (such as FlowJo manual gating) might miss. Future studies are planned to validate and follow up the immune biomarkers (and combinations/interactions thereof) associated with clinical responses identified with this computational pipeline. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Article
PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations
Int. J. Mol. Sci. 2021, 22(4), 2120; https://doi.org/10.3390/ijms22042120 - 20 Feb 2021
Cited by 2 | Viewed by 852
Abstract
Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental [...] Read more.
Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Article
A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification
Int. J. Mol. Sci. 2020, 21(23), 9070; https://doi.org/10.3390/ijms21239070 - 28 Nov 2020
Cited by 29 | Viewed by 1236
Abstract
Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes [...] Read more.
Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state-of-the-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Review

Jump to: Research

Review
Protein Design with Deep Learning
Int. J. Mol. Sci. 2021, 22(21), 11741; https://doi.org/10.3390/ijms222111741 - 29 Oct 2021
Viewed by 392
Abstract
Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount [...] Read more.
Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Review
Artificial Intelligence in Bulk and Single-Cell RNA-Sequencing Data to Foster Precision Oncology
Int. J. Mol. Sci. 2021, 22(9), 4563; https://doi.org/10.3390/ijms22094563 - 27 Apr 2021
Cited by 1 | Viewed by 1156
Abstract
Artificial intelligence, or the discipline of developing computational algorithms able to perform tasks that requires human intelligence, offers the opportunity to improve our idea and delivery of precision medicine. Here, we provide an overview of artificial intelligence approaches for the analysis of large-scale [...] Read more.
Artificial intelligence, or the discipline of developing computational algorithms able to perform tasks that requires human intelligence, offers the opportunity to improve our idea and delivery of precision medicine. Here, we provide an overview of artificial intelligence approaches for the analysis of large-scale RNA-sequencing datasets in cancer. We present the major solutions to disentangle inter- and intra-tumor heterogeneity of transcriptome profiles for an effective improvement of patient management. We outline the contributions of learning algorithms to the needs of cancer genomics, from identifying rare cancer subtypes to personalizing therapeutic treatments. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Review
Towards the Interpretability of Machine Learning Predictions for Medical Applications Targeting Personalised Therapies: A Cancer Case Survey
Int. J. Mol. Sci. 2021, 22(9), 4394; https://doi.org/10.3390/ijms22094394 - 22 Apr 2021
Cited by 1 | Viewed by 1268
Abstract
Artificial Intelligence is providing astonishing results, with medicine being one of its favourite playgrounds. Machine Learning and, in particular, Deep Neural Networks are behind this revolution. Among the most challenging targets of interest in medicine are cancer diagnosis and therapies but, to start [...] Read more.
Artificial Intelligence is providing astonishing results, with medicine being one of its favourite playgrounds. Machine Learning and, in particular, Deep Neural Networks are behind this revolution. Among the most challenging targets of interest in medicine are cancer diagnosis and therapies but, to start this revolution, software tools need to be adapted to cover the new requirements. In this sense, learning tools are becoming a commodity but, to be able to assist doctors on a daily basis, it is essential to fully understand how models can be interpreted. In this survey, we analyse current machine learning models and other in-silico tools as applied to medicine—specifically, to cancer research—and we discuss their interpretability, performance and the input data they are fed with. Artificial neural networks (ANN), logistic regression (LR) and support vector machines (SVM) have been observed to be the preferred models. In addition, convolutional neural networks (CNNs), supported by the rapid development of graphic processing units (GPUs) and high-performance computing (HPC) infrastructures, are gaining importance when image processing is feasible. However, the interpretability of machine learning predictions so that doctors can understand them, trust them and gain useful insights for the clinical practice is still rarely considered, which is a factor that needs to be improved to enhance doctors’ predictive capacity and achieve individualised therapies in the near future. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Review
Incorporating Machine Learning into Established Bioinformatics Frameworks
Int. J. Mol. Sci. 2021, 22(6), 2903; https://doi.org/10.3390/ijms22062903 - 12 Mar 2021
Cited by 2 | Viewed by 4994
Abstract
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be [...] Read more.
The exponential growth of biomedical data in recent years has urged the application of numerous machine learning techniques to address emerging problems in biology and clinical research. By enabling the automatic feature extraction, selection, and generation of predictive models, these methods can be used to efficiently study complex biological systems. Machine learning techniques are frequently integrated with bioinformatic methods, as well as curated databases and biological networks, to enhance training and validation, identify the best interpretable features, and enable feature and model investigation. Here, we review recently developed methods that incorporate machine learning within the same framework with techniques from molecular evolution, protein structure analysis, systems biology, and disease genomics. We outline the challenges posed for machine learning, and, in particular, deep learning in biomedicine, and suggest unique opportunities for machine learning techniques integrated with established bioinformatics approaches to overcome some of these challenges. Full article
(This article belongs to the Special Issue Deep Learning and Machine Learning in Bioinformatics)
Show Figures

Figure 1

Back to TopTop