Biocomputing and Synthetic Biology in Cells

A special issue of Cells (ISSN 2073-4409). This special issue belongs to the section "Cell Methods".

Deadline for manuscript submissions: closed (28 September 2020) | Viewed by 58405

Special Issue Editor


grade E-Mail Website
Guest Editor
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
Interests: bioinformatics; parallel computing; deep learning; protein classification; genome assembly
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Biocomputing and synthetic biology have been two of the most exciting emerging fields in recent years. Biocomputing focuses on developing novel computational models beyond the Turing machine, such as DNA computing and membrane computing. It aims at to create a super machine in cells without any silicon. Synthetic biology, a more detailed extension of biocomputing, involves the design of circuits, simulations, and cell analysis. It is interdisciplinary, involving the chemical industry, biotechnology, computer science and mathematics.

For this Special Issue, we invite the submission of papers on new emerging topics or computational techniques in biocomputing and synthetic biology, particularly those involving interdisciplinary research.

Prof. Quan Zou
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Cells is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • biocomputing
  • DNA computing
  • membrane computing
  • bioinformatics
  • neural computing
  • computational systems biology
  • synthetic biology
  • bio-inspired computing
  • DNA storage

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

4 pages, 185 KiB  
Editorial
Biocomputing and Synthetic Biology in Cells: Cells Special Issue
by Feifei Cui and Quan Zou
Cells 2020, 9(11), 2459; https://doi.org/10.3390/cells9112459 - 11 Nov 2020
Viewed by 2135
Abstract
Biocomputing and synthetic biology have been two of the most exciting emerging fields in recent years [...] Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)

Research

Jump to: Editorial

15 pages, 2314 KiB  
Article
Detecting Interactive Gene Groups for Single-Cell RNA-Seq Data Based on Co-Expression Network Analysis and Subgraph Learning
by Xiucai Ye, Weihang Zhang, Yasunori Futamura and Tetsuya Sakurai
Cells 2020, 9(9), 1938; https://doi.org/10.3390/cells9091938 - 21 Aug 2020
Cited by 11 | Viewed by 4154
Abstract
High-throughput sequencing technologies have enabled the generation of single-cell RNA-seq (scRNA-seq) data, which explore both genetic heterogeneity and phenotypic variation between cells. Some methods have been proposed to detect the related genes causing cell-to-cell variability for understanding tumor heterogeneity. However, most existing methods [...] Read more.
High-throughput sequencing technologies have enabled the generation of single-cell RNA-seq (scRNA-seq) data, which explore both genetic heterogeneity and phenotypic variation between cells. Some methods have been proposed to detect the related genes causing cell-to-cell variability for understanding tumor heterogeneity. However, most existing methods detect the related genes separately, without considering gene interactions. In this paper, we proposed a novel learning framework to detect the interactive gene groups for scRNA-seq data based on co-expression network analysis and subgraph learning. We first utilized spectral clustering to identify the subpopulations of cells. For each cell subpopulation, the differentially expressed genes were then selected to construct a gene co-expression network. Finally, the interactive gene groups were detected by learning the dense subgraphs embedded in the gene co-expression networks. We applied the proposed learning framework on a real cancer scRNA-seq dataset to detect interactive gene groups of different cancer subtypes. Systematic gene ontology enrichment analysis was performed to examine the detected genes groups by summarizing the key biological processes and pathways. Our analysis shows that different subtypes exhibit distinct gene co-expression networks and interactive gene groups with different functional enrichment. The interactive genes are expected to yield important references for understanding tumor heterogeneity. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Figure 1

16 pages, 2333 KiB  
Article
DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning
by Abdul Wahab, Omid Mahmoudi, Jeehong Kim and Kil To Chong
Cells 2020, 9(8), 1756; https://doi.org/10.3390/cells9081756 - 22 Jul 2020
Cited by 29 | Viewed by 4087
Abstract
N4-methylcytosine as one kind of modification of DNA has a critical role which alters genetic performance such as protein interactions, conformation, stability in DNA as well as the regulation of gene expression same cell developmental and genomic imprinting. Some different 4mC site identifiers [...] Read more.
N4-methylcytosine as one kind of modification of DNA has a critical role which alters genetic performance such as protein interactions, conformation, stability in DNA as well as the regulation of gene expression same cell developmental and genomic imprinting. Some different 4mC site identifiers have been proposed for various species. Herein, we proposed a computational model, DNC4mC-Deep, including six encoding techniques plus a deep learning model to predict 4mC sites in the genome of F. vesca, R. chinensis, and Cross-species dataset. It was demonstrated by the 10-fold cross-validation test to get superior performance. The DNC4mC-Deep obtained 0.829 and 0.929 of MCC on F. vesca and R. chinensis training dataset, respectively, and 0.814 on cross-species. This means the proposed method outperforms the state-of-the-art predictors at least 0.284 and 0.265 on F. vesca and R. chinensis training dataset in turn. Furthermore, the DNC4mC-Deep achieved 0.635 and 0.565 of MCC on F. vesca and R. chinensis independent dataset, respectively, and 0.562 on cross-species which shows it can achieve the best performance to predict 4mC sites as compared to the state-of-the-art predictor. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Figure 1

16 pages, 4567 KiB  
Article
GPS-PBS: A Deep Learning Framework to Predict Phosphorylation Sites that Specifically Interact with Phosphoprotein-Binding Domains
by Yaping Guo, Wanshan Ning, Peiran Jiang, Shaofeng Lin, Chenwei Wang, Xiaodan Tan, Lan Yao, Di Peng and Yu Xue
Cells 2020, 9(5), 1266; https://doi.org/10.3390/cells9051266 - 20 May 2020
Cited by 10 | Viewed by 3978
Abstract
Protein phosphorylation is essential for regulating cellular activities by modifying substrates at specific residues, which frequently interact with proteins containing phosphoprotein-binding domains (PPBDs) to propagate the phosphorylation signaling into downstream pathways. Although massive phosphorylation sites (p-sites) have been reported, most of their interacting [...] Read more.
Protein phosphorylation is essential for regulating cellular activities by modifying substrates at specific residues, which frequently interact with proteins containing phosphoprotein-binding domains (PPBDs) to propagate the phosphorylation signaling into downstream pathways. Although massive phosphorylation sites (p-sites) have been reported, most of their interacting PPBDs are unknown. Here, we collected 4458 known PPBD-specific binding p-sites (PBSs), considerably improved our previously developed group-based prediction system (GPS) algorithm, and implemented a deep learning plus transfer learning strategy for model training. Then, we developed a new online service named GPS-PBS, which can hierarchically predict PBSs of 122 single PPBD clusters belonging to two groups and 16 families. By comparison, GPS-PBS achieved a highly competitive accuracy against other existing tools. Using GPS-PBS, we predicted 371,018 mammalian p-sites that potentially interact with at least one PPBD, and revealed that various PPBD-containing proteins (PPCPs) and protein kinases (PKs) can simultaneously regulate the same p-sites to orchestrate important pathways, such as the PI3K-Akt signaling pathway. Taken together, we anticipate GPS-PBS can be a great help for further dissecting phosphorylation signaling networks. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Figure 1

17 pages, 2499 KiB  
Article
Systematic Identification of Housekeeping Genes Possibly Used as References in Caenorhabditis elegans by Large-Scale Data Integration
by Jingxin Tao, Youjin Hao, Xudong Li, Huachun Yin, Xiner Nie, Jie Zhang, Boying Xu, Qiao Chen and Bo Li
Cells 2020, 9(3), 786; https://doi.org/10.3390/cells9030786 - 24 Mar 2020
Cited by 13 | Viewed by 5268
Abstract
For accurate gene expression quantification, normalization of gene expression data against reliable reference genes is required. It is known that the expression levels of commonly used reference genes vary considerably under different experimental conditions, and therefore, their use for data normalization is limited. [...] Read more.
For accurate gene expression quantification, normalization of gene expression data against reliable reference genes is required. It is known that the expression levels of commonly used reference genes vary considerably under different experimental conditions, and therefore, their use for data normalization is limited. In this study, an unbiased identification of reference genes in Caenorhabditis elegans was performed based on 145 microarray datasets (2296 gene array samples) covering different developmental stages, different tissues, drug treatments, lifestyle, and various stresses. As a result, thirteen housekeeping genes (rps-23, rps-26, rps-27, rps-16, rps-2, rps-4, rps-17, rpl-24.1, rpl-27, rpl-33, rpl-36, rpl-35, and rpl-15) with enhanced stability were comprehensively identified by using six popular normalization algorithms and RankAggreg method. Functional enrichment analysis revealed that these genes were significantly overrepresented in GO terms or KEGG pathways related to ribosomes. Validation analysis using recently published datasets revealed that the expressions of newly identified candidate reference genes were more stable than the commonly used reference genes. Based on the results, we recommended using rpl-33 and rps-26 as the optimal reference genes for microarray and rps-2 and rps-4 for RNA-sequencing data validation. More importantly, the most stable rps-23 should be a promising reference gene for both data types. This study, for the first time, successfully displays a large-scale microarray data driven genome-wide identification of stable reference genes for normalizing gene expression data and provides a potential guideline on the selection of universal internal reference genes in C. elegans, for quantitative gene expression analysis. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Figure 1

16 pages, 9280 KiB  
Article
Computational Identification and Analysis of Ubiquinone-Binding Proteins
by Chang Lu, Wenjie Jiang, Hang Wang, Jinxiu Jiang, Zhiqiang Ma and Han Wang
Cells 2020, 9(2), 520; https://doi.org/10.3390/cells9020520 - 24 Feb 2020
Cited by 3 | Viewed by 3199
Abstract
Ubiquinone is an important cofactor that plays vital and diverse roles in many biological processes. Ubiquinone-binding proteins (UBPs) are receptor proteins that dock with ubiquinones. Analyzing and identifying UBPs via a computational approach will provide insights into the pathways associated with ubiquinones. In [...] Read more.
Ubiquinone is an important cofactor that plays vital and diverse roles in many biological processes. Ubiquinone-binding proteins (UBPs) are receptor proteins that dock with ubiquinones. Analyzing and identifying UBPs via a computational approach will provide insights into the pathways associated with ubiquinones. In this work, we were the first to propose a UBPs predictor (UBPs-Pred). The optimal feature subset selected from three categories of sequence-derived features was fed into the extreme gradient boosting (XGBoost) classifier, and the parameters of XGBoost were tuned by multi-objective particle swarm optimization (MOPSO). The experimental results over the independent validation demonstrated considerable prediction performance with a Matthews correlation coefficient (MCC) of 0.517. After that, we analyzed the UBPs using bioinformatics methods, including the statistics of the binding domain motifs and protein distribution, as well as an enrichment analysis of the gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Figure 1

22 pages, 3412 KiB  
Article
PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method
by Phasit Charoenkwan, Sakawrat Kanthawong, Nalini Schaduangrat, Janchai Yana and Watshara Shoombuatong
Cells 2020, 9(2), 353; https://doi.org/10.3390/cells9020353 - 3 Feb 2020
Cited by 45 | Viewed by 3953
Abstract
Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis [...] Read more.
Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis of PVPs might be of great significance to understanding the molecular mechanisms of bacteriophage genetics and the development of antibacterial drugs. Hence, we herein proposed a novel method (PVPred-SCM) based on the scoring card method (SCM) in conjunction with dipeptide composition to identify and characterize PVPs. In PVPred-SCM, the propensity scores of 400 dipeptides were calculated using the statistical discrimination approach. Rigorous independent validation test showed that PVPred-SCM utilizing only dipeptide composition yielded an accuracy of 77.56%, indicating that PVPred-SCM performed well relative to the state-of-the-art method utilizing a number of protein features. Furthermore, the propensity scores of dipeptides were used to provide insights into the biochemical and biophysical properties of PVPs. Upon comparison, it was found that PVPred-SCM was superior to the existing methods considering its simplicity, interpretability, and implementation. Finally, in an effort to facilitate high-throughput prediction of PVPs, we provided a user-friendly web-server for identifying the likelihood of whether or not these sequences are PVPs. It is anticipated that PVPred-SCM will become a useful tool or at least a complementary existing method for predicting and analyzing PVPs. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Graphical abstract

14 pages, 1365 KiB  
Article
Computational Detection of Breast Cancer Invasiveness with DNA Methylation Biomarkers
by Chunyu Wang, Ning Zhao, Linlin Yuan and Xiaoyan Liu
Cells 2020, 9(2), 326; https://doi.org/10.3390/cells9020326 - 30 Jan 2020
Cited by 7 | Viewed by 3426
Abstract
Breast cancer is the most common female malignancy. It has high mortality, primarily due to metastasis and recurrence. Patients with invasive and noninvasive breast cancer require different treatments, so there is an urgent need for predictive tools to guide clinical decision making and [...] Read more.
Breast cancer is the most common female malignancy. It has high mortality, primarily due to metastasis and recurrence. Patients with invasive and noninvasive breast cancer require different treatments, so there is an urgent need for predictive tools to guide clinical decision making and avoid overtreatment of noninvasive breast cancer and undertreatment of invasive cases. Here, we divided the sample set based on the genome-wide methylation distance to make full use of metastatic cancer data. Specifically, we implemented two differential methylation analysis methods to identify specific CpG sites. After effective dimensionality reduction, we constructed a methylation-based classifier using the Random Forest algorithm to categorize the primary breast cancer. We took advantage of breast cancer (BRCA) HM450 DNA methylation data and accompanying clinical data from The Cancer Genome Atlas (TCGA) database to validate the performance of the classifier. Overall, this study demonstrates DNA methylation as a potential biomarker to predict breast tumor invasiveness and as a possible parameter that could be included in the studies aiming to predict breast cancer aggressiveness. However, more comparative studies are needed to assess its usability in the clinic. Towards this, we developed a website based on these algorithms to facilitate its use in studies and predictions of breast cancer invasiveness. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Figure 1

13 pages, 1165 KiB  
Article
Improving the Quantification of DNA Sequences Using Evolutionary Information Based on Deep Learning
by Hilal Tayara and Kil To Chong
Cells 2019, 8(12), 1635; https://doi.org/10.3390/cells8121635 - 14 Dec 2019
Cited by 18 | Viewed by 3994
Abstract
It is known that over 98% of the human genome is non-coding, and 93% of disease associated variants are located in these regions. Therefore, understanding the function of these regions is important. However, this task is challenging as most of these regions are [...] Read more.
It is known that over 98% of the human genome is non-coding, and 93% of disease associated variants are located in these regions. Therefore, understanding the function of these regions is important. However, this task is challenging as most of these regions are not well understood in terms of their functions. In this paper, we introduce a novel computational model based on deep neural networks, called DQDNN, for quantifying the function of non-coding DNA regions. This model combines convolution layers for capturing regularity motifs at multiple scales and recurrent layers for capturing long term dependencies between the captured motifs. In addition, we show that integrating evolutionary information with raw genomic sequences improves the performance of the predictor significantly. The proposed model outperforms the state-of-the-art ones using raw genomics sequences only and also by integrating evolutionary information with raw genomics sequences. More specifically, the proposed model improves 96.9% and 98% of the targets in terms of area under the receiver operating characteristic curve and the precision-recall curve, respectively. In addition, the proposed model improved the prioritization of functional variants of expression quantitative trait loci (eQTLs) compared with the state-of-the-art models. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Figure 1

12 pages, 626 KiB  
Article
Predicting Disease Related microRNA Based on Similarity and Topology
by Zhihua Chen, Xinke Wang, Peng Gao, Hongju Liu and Bosheng Song
Cells 2019, 8(11), 1405; https://doi.org/10.3390/cells8111405 - 7 Nov 2019
Cited by 12 | Viewed by 2969
Abstract
It is known that many diseases are caused by mutations or abnormalities in microRNA (miRNA). The usual method to predict miRNA disease relationships is to build a high-quality similarity network of diseases and miRNAs. All unobserved associations are ranked by their similarity scores, [...] Read more.
It is known that many diseases are caused by mutations or abnormalities in microRNA (miRNA). The usual method to predict miRNA disease relationships is to build a high-quality similarity network of diseases and miRNAs. All unobserved associations are ranked by their similarity scores, such that a higher score indicates a greater probability of a potential connection. However, this approach does not utilize information within the network. Therefore, in this study, we propose a machine learning method, called STIM, which uses network topology information to predict disease–miRNA associations. In contrast to the conventional approach, STIM constructs features according to information on similarity and topology in networks and then uses a machine learning model to predict potential associations. To verify the reliability and accuracy of our method, we compared STIM to other classical algorithms. The results of fivefold cross validation demonstrated that STIM outperforms many existing methods, particularly in terms of the area under the curve. In addition, the top 30 candidate miRNAs recommended by STIM in a case study of lung neoplasm have been confirmed in previous experiments, which proved the validity of the method. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Figure 1

15 pages, 3111 KiB  
Article
4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N4-Methylcytosine Sites in the Mouse Genome
by Balachandran Manavalan, Shaherin Basith, Tae Hwan Shin, Da Yeon Lee, Leyi Wei and Gwang Lee
Cells 2019, 8(11), 1332; https://doi.org/10.3390/cells8111332 - 28 Oct 2019
Cited by 83 | Viewed by 4456
Abstract
DNA N4-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, [...] Read more.
DNA N4-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few computational studies, in particular machine learning (ML) approaches have been applied in the prediction of 4mC site predictions. Although ML-based methods are promising for 4mC identification in other species, none are available for detecting 4mCs in the mouse genome. Our novel computational approach, called 4mCpred-EL, is the first method for identifying 4mC sites in the mouse genome where four different ML algorithms with a wide range of seven feature encodings are utilized. Subsequently, those feature encodings predicted probabilistic values are used as a feature vector and are once again inputted to ML algorithms, whose corresponding models are integrated into ensemble learning. Our benchmarking results demonstrated that 4mCpred-EL achieved an accuracy and MCC values of 0.795 and 0.591, which significantly outperformed seven other classifiers by more than 1.5–5.9% and 3.2–11.7%, respectively. Additionally, 4mCpred-EL attained an overall accuracy of 79.80%, which is 1.8–5.1% higher than that yielded by seven other classifiers in the independent evaluation. We provided a user-friendly web server, namely 4mCpred-EL which could be implemented as a pre-screening tool for the identification of potential 4mC sites in the mouse genome. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Graphical abstract

17 pages, 2413 KiB  
Article
A Novel Computational Model for Predicting microRNA–Disease Associations Based on Heterogeneous Graph Convolutional Networks
by Chunyan Li, Hongju Liu, Qian Hu, Jinlong Que and Junfeng Yao
Cells 2019, 8(9), 977; https://doi.org/10.3390/cells8090977 - 26 Aug 2019
Cited by 37 | Viewed by 5567
Abstract
Identifying the interactions between disease and microRNA (miRNA) can accelerate drugs development, individualized diagnosis, and treatment for various human diseases. However, experimental methods are time-consuming and costly. So computational approaches to predict latent miRNA–disease interactions are eliciting increased attention. But most previous studies [...] Read more.
Identifying the interactions between disease and microRNA (miRNA) can accelerate drugs development, individualized diagnosis, and treatment for various human diseases. However, experimental methods are time-consuming and costly. So computational approaches to predict latent miRNA–disease interactions are eliciting increased attention. But most previous studies have mainly focused on designing complicated similarity-based methods to predict latent interactions between miRNAs and diseases. In this study, we propose a novel computational model, termed heterogeneous graph convolutional network for miRNA–disease associations (HGCNMDA), which is based on known human protein–protein interaction (PPI) and integrates four biological networks: miRNA–disease, miRNA–gene, disease–gene, and PPI network. HGCNMDA achieved reliable performance using leave-one-out cross-validation (LOOCV). HGCNMDA is then compared to three state-of-the-art algorithms based on five-fold cross-validation. HGCNMDA achieves an AUC of 0.9626 and an average precision of 0.9660, respectively, which is ahead of other competitive algorithms. We further analyze the top-10 unknown interactions between miRNA and disease. In summary, HGCNMDA is a useful computational model for predicting miRNA–disease interactions. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Graphical abstract

17 pages, 3140 KiB  
Article
Ensemble of Deep Recurrent Neural Networks for Identifying Enhancers via Dinucleotide Physicochemical Properties
by Kok Keng Tan, Nguyen Quoc Khanh Le, Hui-Yuan Yeh and Matthew Chin Heng Chua
Cells 2019, 8(7), 767; https://doi.org/10.3390/cells8070767 - 23 Jul 2019
Cited by 30 | Viewed by 5051
Abstract
Enhancers are short deoxyribonucleic acid fragments that assume an important part in the genetic process of gene expression. Due to their possibly distant location relative to the gene that is acted upon, the identification of enhancers is difficult. There are many published works [...] Read more.
Enhancers are short deoxyribonucleic acid fragments that assume an important part in the genetic process of gene expression. Due to their possibly distant location relative to the gene that is acted upon, the identification of enhancers is difficult. There are many published works focused on identifying enhancers based on their sequence information, however, the resulting performance still requires improvements. Using deep learning methods, this study proposes a model ensemble of classifiers for predicting enhancers based on deep recurrent neural networks. The input features of deep ensemble networks were generated from six types of dinucleotide physicochemical properties, which had outperformed the other features. In summary, our model which used this ensemble approach could identify enhancers with achieved sensitivity of 75.5%, specificity of 76%, accuracy of 75.5%, and MCC of 0.51. For classifying enhancers into strong or weak sequences, our model reached sensitivity of 83.15%, specificity of 45.61%, accuracy of 68.49%, and MCC of 0.312. Compared to the benchmark result, our results had higher performance in term of most measurement metrics. The results showed that deep model ensembles hold the potential for improving on the best results achieved to date using shallow machine learning methods. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Figure 1

15 pages, 2106 KiB  
Article
Convolutional Neural Network and Bidirectional Long Short-Term Memory-Based Method for Predicting Drug–Disease Associations
by Ping Xuan, Yilin Ye, Tiangang Zhang, Lianfeng Zhao and Chang Sun
Cells 2019, 8(7), 705; https://doi.org/10.3390/cells8070705 - 11 Jul 2019
Cited by 40 | Viewed by 4645
Abstract
Identifying novel indications for approved drugs can accelerate drug development and reduce research costs. Most previous studies used shallow models for prioritizing the potential drug-related diseases and failed to deeply integrate the paths between drugs and diseases which may contain additional association information. [...] Read more.
Identifying novel indications for approved drugs can accelerate drug development and reduce research costs. Most previous studies used shallow models for prioritizing the potential drug-related diseases and failed to deeply integrate the paths between drugs and diseases which may contain additional association information. A deep-learning-based method for predicting drug–disease associations by integrating useful information is needed. We proposed a novel method based on a convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM)—CBPred—for predicting drug-related diseases. Our method deeply integrates similarities and associations between drugs and diseases, and paths among drug-disease pairs. The CNN-based framework focuses on learning the original representation of a drug-disease pair from their similarities and associations. As the drug-disease association possibility also depends on the multiple paths between them, the BiLSTM-based framework mainly learns the path representation of the drug-disease pair. In addition, considering that different paths have discriminate contributions to the association prediction, an attention mechanism at path level is constructed. Our method, CBPred, showed better performance and retrieved more real associations in the front of the results, which is more important for biologists. Case studies further confirmed that CBPred can discover potential drug-disease associations. Full article
(This article belongs to the Special Issue Biocomputing and Synthetic Biology in Cells)
Show Figures

Figure 1

Back to TopTop